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ABSTRACT 

Factors that contribute to useful measurement of the 
impact of educational programs on college students are reviewed. 
Chapters cover the following: goals of student outcomes assessment; 
philosophy of assessment; outcome taxonomies; issues of measurement 
in talent development assessment; cognitive outcome instruments; 
increasing the usefulness of outcomes assessments; and practical 
suggestions for conducting assessments. It is suggested that an 
institutional program of assessing student outcomes should be based 
on a coherent philosophy of institutional mission and should reflect 
a conception of what constitutes effective performance of that 
mission. Two commonly used approaches to defining excellence are 
discussed, reputational and resource approaches, but a ^talent 
development" approach is proposed in which assessment focuses more on 
changes or improvements in students* performance from entry to exit. 
Talent development assessments may be conducted with either standard, 
commercially available assessment instruments or with locally 
designed instriunents developed on campus. Reasons why assessments may 
not live up to their potential as management tools are addressed, 
including inadequate conceptualization or political barriers. 
Appended is a siumnary of the more than 25 cognitive assessment 
instruments discussed (general education tests, specific skills 
tests, and subject matter competency). This document contains 
approximately 120 references. (LB) 
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EXECUTIVE SUMMARY 



Concern with outcomes assessment is by no means new in 
postsecondary education. Researchers, practitioners, and policy 
makers have long urged colleges and universities to measure 
the impact of their educational programs (see, for example, 
Bowen 1974). Recent national reports (e.g.. Study Group 
1984) highlight the promise and potential of outcomes assess- 
ments as tools for institutional self-improvement. But will the 
benefits derived from these assessments justify their costs? This 
monograph describes the factors that contribute to useful out- 
comes assessments. 

A useful assessment has several distinguishing characteris- 
tics. First, the assessment produces data relevant to issues fac- 
ing educational practitioners today. Second, the assessment 
provides information about students' change and development, 
not only an isolated snapshot of student competencies at a sin- 
gle time. Third, the longitudinal data include information about 
students' educational experiences so that the effectr of these ex- 
periences can be assessed. Finally, the results are analyzed and 
presented in a manner that facilitates their use by practitioners. 

Why Study Student Outcomes? 

While the assessment of student outcomes has many advocates, 
experience has shown that such assessments often fail to live 
up to initial expectations about their usefulness. This gap be- 
tween promise and performance often occurs because of unclear 
or conflicting expectations about the goals and purposes of the 
research. A careful consideration of the goals of assessment is 
essential if research methods and measures are to be matched to 
institutional goals and expectations. The goals of assessment 
may include establishing accountability for external agencies, 
analyzing cost effectiveness, evaluating and developing pro- 
grams, setting goals, marketing, and undertaking strategic plan- 
ning and basic research. 

What Is Excellence? 

Any attempt to implement an institutional program of assessing 
student outcomes should be based on some coherent philosophy 
of institutional mission. In particular, the assessment program 
should reflect some conception of what constitutes effective 
pcifomance of that mission. And effective performance is of 
course closely allied to concepts of institutional quality or ex- 
cellence. 
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^'Excellence" and ''qualiiy" are perhaps the most fashiona- 
ble terms in discussions of education these days. But even 
though many of us are fond of talking about excellence, we 
seldom take the trouble to define what we ??jea?i by excellence. 

The two most commonly used approaches to defining excel- 
lence can be labeled as [he repu(o(io?ial and resource ap- 
proaches (Astin 1985). The reputational view holds that 
excellence is equated with an institution's rank in the prestige 
pecking order of the institution as revealed, for example, in pe- 
riodic national surveys. The resource approac*: holds that excel- 
lence is equated with such criteria as the test scores of entering 
freshmen, the endowment, the physical plant, the scholarly pro- 
ductivity of the faculty, and so on. These approaches are mu- 
tually reinforcing in the sense that enhanced reputation can 
bring an institution additional resources, and additional re- 
sources like highly able students and a nationally visible faculty 
can enhance an institution's reputation. 

Perhaps the major limitation of these traditional approaches 
is that they do not necessarily reflect higher education's most 
fundamental purpose: the education of students. If one accepts 
the idea that higher education's principal reason for being is to 
develop the talents of students, then "quality" or "excellence" 
should reflect educational effectiveness rather than mere reputa- 
tion or resources. This alternative conception of excellence can 
be labeled the "talent development" view (Astin 1985). Under 
the talent development view, then, a high-quality institution is 
one that maximizes the intellectual and personal development 
of its students. 

These alternative views have important implications for insti- 
tutional assessment. Under the reputational and resource ap- 
proaches, attention is focused on the caliber of the entering 
students as reflected in standardized admissions test scores and 
high school grade averages. Students who are high achievers 
are thus viewed as an important institutional "resource," which 
also tends to enhance the institution's reputation. Under a talent 
development approach, on the other hand, assessment focuses 
more on changes or improvements in students' performance 
from entry to exit. 

How Can I Apply a Talent Development 
Approach on My Campus? 

In actual practice, the talent development approach might be 
applied to an individual campus somewhat as follows: Newly 
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admitted students would be tested to determine their entering 
level of competence for purposes of counseling and placement. 
These initial scores would be useful not only in providing in- 
formation about a student's specific strengths and weaknesses 
but also in establishing a baseline against which to measure that 
student's subsequent progress. After the student completes a 
course of study, the same or similar assessments are repeated 
and the differences in performance are used in providing criti- 
cal information on the student's growth and development— not 
only to the student but alsu to the professor and institution. 

Outcomes assessment from a talent development perspective 
is characterized by longitudinal ("pretest, posttest"; designs, in 
which a group of students are tested with the same (or compa- 
rable) measures at different times, thereby providing measures 
of growth and change over time. The talent development ap- 
proach does not depend on the use of any particular method of 
assessment. Objective tests, essays, and interviews, departmen- 
tal examinations, work samples, performance examinations, 
yd any other devices might be appropriate, depending on the 
.itent and objectives of the curriculum or program being as- 
sessed. 

Talent development assessments may be conducted with 
either standardized assessment instruments that are commer- 
cially available from testing organizations or with locally de- 
signed instruments developed by faculty and institutional 
researchers on campus. Standardized assessment instruments of- 
fer the user several advantages, including established reliability 
and validity, comparative and normative data, and efficiency in 
administration and analysis as a result of services from ven- 
dors. On the other hand, standardized instruments are unlikely 
to be useful if the testing organization and the potential user 
differ in the manner in which they define key concepts. Fur- 
ther, locally designed instruments provide an opportunity to in- 
volve faculty, administrators, staff, and students in a 
collaborative effort to reflect upon and define key educational 
objectives. 

This review indicates four recurring methodological issues 
that influence the suitability of standardized instruments for tal- 
ent development purposes; (1) the likelihood that students will 
bottom out or top out, thereby losing the ability to make valid 
longitudinal or cross-sectional comparisons; (2) the availability 
of item scores in addition to scale and total scores; (3) the va- 
lidity of results on the individual level as well as the aggregate 
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level; and (4) the availability of absolute rather than relativistic 
measures of performance. In addition, longitudinal assessments 
may be weakened by a variety of methodological confounds 
(Cook and Campbell 1979), such as the effects of maturation, 
instrumentation, and testing and statistical regression. Each 
confound reduces the likelihood that the outcomes assessment 
accurately measures the effect of the educational prograh. 

This monograph also briefly describes over 25 tests that may 
be considered by institutions interested in assessing student out- 
comes. These instruments fall into three broad categories: (1) 
integrated packages for assessment of ^'general education," (2) 
instruments designed to assess a particular skill of importance 
in higher education, anu (3) subject matter competency tests. 

How Can I Increase the Usefulness of Outcomes 
Assessments on My Campus? 

A successful student outcomes project not only measures im- 
pact—it also produces impact. The successful project becomes 
a tool for administrators, trustees, faculty, students, and exter- 
nal reviewers to use in evaluation and decision making. Yet all 
too often outcomes assessments fall short of this goal (Evvell 
1983). Why are data often discounted or ignored? The mono- 
graph discusses three reasons why such assessm,ents may fail to 
live up to their potential as management tools: (1) inadequate 
conceptualization, (2) technical barriers, and (3) political bar- 
riers. 

Several aspects of the talent development perspective contrib- 
ute to bridging the gap between researchers and practitioners. 
By rejecting an adversarial approach to evaluation in favor of 
an informational approach, the talent development perspective 
reduces dcfensiveness and hostility to evaluation. By emphasiz- 
ing longitudinal designs with pre- and posttesting, talent devel- 
opment assessments reduce the ambiguity of findings; 
researchers and practitioners are more likely to agree on the 
interpretation of the results. 

A review of the literature indicates a number of factors that 
increase the usefulness of information about outcomes: 

1. Involvement of research practitioners and target audi- 
ences; 

2. Support of top administrators; 

3. Technical quality of the research and the interactions of 
technical and political issues; 
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4. Dissemination as an ongoing process of communication 
between researchers and practitioners; 

5. Recommendations that are incremental and clearly con- 
nected to the data; 

6. Report formats that are based on issues and directly ad- 
dress concerns of practitioners; and 

7. Structures and settings that provide opportunities for deci- 
sion makers and researchers to jointly review the data. 

What Are Some Practical Suggestions 
For Conducting Assessments? 

The monograph provides a number of practical, nuts-and-bolts 
suggestions for implementing a comprehensive program of out- 
comes assessments: 

1. How to use assessments to facilitate and improve per- 
formance rather than merely to evaluate it; 

2. How to build on what is already there by making better 
use of tests already in use; 

3. How to begin development of a student data base for lon- 
gitudinal student assessment; 

4. How to get more from standardized tests; and 

5. How to encourage students' participation in longitudinal 
assessments. 

Is My Institution Ready to Conduct a 
Student Outcomes Assessment? 

To assist readers in determining their readiness to implement 
assessment progiams, a quick "self-sludy" guide is offered. 
The guide includes 15 questions for consideration in planning 
an assessment of outcomes and covers philosophical, concep- 
tual, methodological, and organizational issues. 
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FOREWORD 



Student assessment is not a fad. It is not a momentary issue 
brought upon us by a transitory reform movement, or some- 
thing that will fade away with a new administration or decade. 
Student assessment in one form or another has been part of 
higher education for years and will be with us as long as we 
want to know anything about the impact or effectiveness of 
what we are doing. The question is not why we are doing it, 
but rather how we can assure that the student assessment 
process is valid. 

The higher education experience can profoundly affect a stu- 
dent in many ways. Intellectual growth, personal and social in- 
teractions, value and ethical development, and religious 
awareness are just a few of the many areas affected by college 
attendance. Therefore one of the major assessment issues is 
what is to be assessed. In this report, wriiten by Maryann Ja- 
cobi, Alexander Astin, and ^rank Ayala, Jr., the focus is on 
cognitive or intellectual growth or as the authors put it, "talent 
development.'' Choosing to focus on this one area is not an at- 
tempt to minimize the other effects of higher education on stu- 
dents; it is merely a recognition that the complexity of the issue 
is such that only a single-focus treatment is possible and desira- 
ble in one monograph. We fully intend to address other out- 
comes in future reports. The underlying reason for this focus is 
the great or predominant interest in assessing liie intellectual 
outcomes of the collegiate experience. 

Jacobi and Astin, both of the University of California at Los 
Angeles, and Ayala, of Incarnate Word College, place a special 
emphasis on helping the readers devise a strategy to determine 
whether their institutions are prepared to implement a valid as- 
sessment program. The next step in this process, of course, is 
to link the outcomes of this evaluation to the policy-making ap- 
paratus. 

Institutions come in different sizes and shapes, public and 
private, teaching-oriented or research-oriented. To assume that 
all colleges and universities will or should instill the same val- 
ues and attitudes in students is wrong and potentially harmful 
to parents, students, and the public alike. Assessing outcomes 
and being able to say "This is what our institution does" will 
have important implications in both faculty recruitment and stu- 
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dent attrition rates. Knowing what your institution does well 
may be the edge needed for the next decade to come. 



Jonathan D, Fife 

Professor and Director 

ERIC Clearinghouse on Higher Education 

School of Education and Human Development 

The George Washington University 
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GOALS OF STUDENT OUTCOMES ASSESSMENT 



The assessment of student outcomes has often been advocated 
as a means of determining a college's success in meeting its 
educational goals. The assumption underlying such recommen- 
dations is the data about student outcomes indicate institutional 
strengths and weaknesses and thereby point to directions for 
improvement. This monograph discusses a variety of issues re- 
lating to the measurement of student outcomes: instruments 
available for such assessments, methodological challenges in 
measurement, and use of the resulting information. 

Concern with outcomes assessment is by no means new in 
postsecondaiy education. 

To evaluate outcomes is difficult. Yet despite these difficul- 
ties, educators have an obligation to assess outcomes as best 
they can, wt only to appease outsiders who demand ac- 
countability, but also to improve internal management 
(Bowen 1974a, p. 121). 

Similarly, information about outcomes can help an institution 
successfully adapt to changing conditions and thereby maintain 
Its stability and identity (Pace 1979). And a better understand- 
ing of the impacts of college on students can provide a founda- 
tion for policy development that includes educational, 
economic, and political considerations (Astin 1976, 1977). 

The recent reports of the Study Group on the Conditions of 
Excellence in American Higher Education (1984) and the Asso- 
ciation of American Colleges (AAC) (Project on Redefining 
1985) highlight the promise and potential of outcomes assess- 
ments as tools for institutional self-improvement. Widespread 
concern about the quality of college education in the United 
States prompted the National Institute of Education (NIE) to 
convene a study group to recommend ways to improve bacca- 
laureate education. Their final report (Study Group 1984) rec- 
ommends that colleges systematically assess the development of 
students' knowledge, capacities, and skills during the college 
years. The report suggests that the results of such assessments 
can be used to evaluate and improve student advising and 
placement, curriculum development, and academic and student 
service programs. 

// is futile to adjust (he content and delivery of programs in 
accordance with redefined, detailed objectives unless one has 
some ways of knowing whether those adjustments have been 
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successful A comprehensive assessment program will help 
faculty detennine what works and what does not (Study 
Group 1984, p. 55). 

Following the NIE and AAC reports, a number of other as- 
sociations have echoed the call for research on outcomes. For 
example, a recent report of the National Governors Association 
recommends that **state attention ... be directed to the out- 
comes of the higher education system— namely, measuring how 
much students learn in college. Assessment is a way that fac- 
ulty, institutions, and institutional sponsors can focus on out- 
comes of students, programs, and institutions" (National 
Governors Association 1987, p. 156). 

In response to these and other reports, a 1987 American 
Council on Education survey of colleges and universities in all 
50 states found that 27 percent of respondents reported their 
states mandate assessment, with 80 percent of respondents an- 
ticipating such a situation within the next few years (El-Khawas 
1987). But will the assessments undertaken by these schools 
really indicate what works and what does not? And will the 
benefits derived from these assessments justify their costs? 

The answers to these questions depend on several factors. 
First, the assessment must produce data that are relevant to is- 
sues facing educational practitioners today. Second, the assess- 
ment should provide information about the change and 
development of students, not only an isolated snapshot of stu- 
dents' competencies at a single time. Third, the longitudinal 
data must include information about students' educational expe- 
riences (course-taking patterns, for example) so that the effects 
of these experiences can be assessed. Finally, the results must 
be analyzed and presented in a manner that facilitates use by 
practitioners. 

The researcher's ability lo accomplish these objectives 
largely depends on the manner in which outcomes are mea- 
sured. The outcomes researcher must select or design a mea- 
sure that defines the outcomes of interest in a manner 
congruent with the institution's perspectives. The measure must 
be sensitive to change over time and must be nested within a 
research design that provides comparisons across time, stu- 
dents, and different educational experiences. The results of 
measurement must be interpreted and presented in a manner 
that underscores their relevance to the institution's goals. 

Tliis monograph is intended to guide researchers, faculty, ad- 
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ministrators, and policy makers in the measurement of student 
outcomes, providing practical suggestions to make outcomes 
assessments as useful as possible. 

Why Study Student Outcomes? 

While the assessment of student outcomes has many advocates, 
experience has shown that such assessments often fail to live 
up to initial expectations about their usefulness. This gap be- 
tween promise and performance is sometimes the consequence 
of methodological (including measurement) shortcomings but 
more often occurs because of unclear or conflicting expecta- 
tions about the goals and purposes of the research. 

A distinction can also be made between what might be 
termed "active" and "passive" uses of outcomes assessment. 
Passive assessment, which is probably the more common appli- 
cation, involves the collection of a broad range of data about 
outcomes to enhance our understanding of how students are in- 
fluenced by their educational programs and experiences. Data 
about outcomes collected in this manner are frequently found to 
have a wide range of uses in program evaluation and planning. 
Active outcomes assessment, on the other hand, is done with 
specific purposes in mind: to determine whether a particular 
program has its intended effects or to provide feedback for stu- 
dents or faculty with the specific idea of enhancing teaching 
and learning. Any given outcomes assessment can of course 
have both active and passive applications. 

A careful consideration of the goals of assessment is essen- 
tial if research methods and measures are to be matched to 
specified goals or expectations. 

Establishing accountability for external agencies 
Institutions of higher education receive financial and other 
forms of support from local, state, and federal governments, 
from taxpayers, from students and their families, and from a 
variety of foundations and organizations. The legitimacy of the 
institution's educational activities is established through the ac- 
creditation process, in which external reviewers evaluate the 
quality of various programs and curricula. The argument for in- 
stitutional "accountability" is based on the assumption that in- 
stitutions have a responsibility to those who provide support to 
demonstrate that institutional goals are being achieved in a 
cost-effective manner. Accountability in higher education can 
be defined as follows: 
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// means that colleges and universities are responsible for 
conducting their affairs so that the outcomes arc wo?ih the 
cost. It implies that institutional effoiis would be directed 
toward appropriate goals and that the outcomes should be 
consistent with these goals and should be achieved at mini- 
mum cost. It also implies that an institution should report 
credible evidence on the degree to which it is achieving its 
mission and on its costs (Bowen 1974b, p. 1). 

Research on student outcomes is only one element in a sys- 
tem of accountability, however (Bowen 1974b). In addition to 
the measurement of student outcomes, assessment for accounta- 
bility could include a consideration of nonstudent outcomes, 
such as faculty productivity and community service. Because 
the primary goals of most colleges and universities concern stu- 
dent learning, however, the assessment of student outcomes is 
fundamental to assessment for accountability. 

Nationwide, outcomes assessment has growing appeal as a 
means of establishing accountability in higher education. Ap- 
proximately one-quarter of states now require state-supported 
institutions to provide some kind of information for assessment. 
While mandated assessment is necessary, however, it is not 
sufficient in establishing accountability. 

Compared with a few years ago, . . . today assessment of 
student learning is no longer a foreign notion , . , , Yet 
what remains more elusive is the link between policies of as- 
sessment and accountability. In fewer than a dozen states are 
state colleges and univemties requwed to include infonna- 
tion on student pejfonnance assessment as a part of the doc- 
umentation of institutional role and mission. Using student 
assessment data to improve programs, teaching, and learn- 
ing, and to hold institutions accountable is also not common 
(National Governors Association 1987, p. 32). 

The debate over the benefits and liabilities of performance- 
based funding and other possible consequences of state- 
mandated assessment is likely to continue over the next decade. 
In the meantime, reser.rchers and practitioners face the chal- 
lenge of designing outcomes assessments that both respond to 
external demands for accountability and also provide useful in- 
formation for internal application. 




Analysis of cost effectiveness 

Closely related to assessment for accountability are analyses of 
cost effecti\'eness. While an institution might demonstrate that 
certain practices facilitate students' growth in dcsir^^d direc- 
tions, one might still ask whether the benefits accrued from 
these practices justify their costs. While cost effectiveness is of 
concern to administrators within the institution, external fund- 
ing and review organizations often emphasize it. 

Economists have done most of the student outcomes research 
by attempting to measure the economic value of a college de- 
gree for its recipients. Cost-benefit research examines short- 
and long-term earnings of college graduates to determine the 
return students receive on their investment in higher education 
(see, for example. Mills 1983; Solmon 1973). While much of 
this research is seriously flawed from the perspective of meth- 
odology, a more serious problem exists: Many, if not most, 
outcomes of college have a value other than monetary. A 
broader approach to this issue is therefore preferable; hence the 
term "cost-effectiveness analysis" is used instead of ''cost- 
benefit analysis" to suggest that the costs of higher education 
must be weighed against the full range of monetary and non- 
monetary outcomes (Rossi and Freeman 1982). 

Institutional self-improvement 

Assessments of outcome are useful not only for satisfying con- 
cerns about accountability of external agencies but also as an 
aid in planning, program development, and allocation of re- 
sources by institutional managers. 

Program evaluation. Program evaluations seek to understand 
the particular programs and structures within the university that 
contribute to or detract from effectiveness. Over the past two 
decades, the quantity and quality of program evaluation in 
higher education have increased considerably, and a major dis- 
tinction has been drawn between process and outcomes evalua- 
tions (cf. Rossi and Freeman 1982). Process evaluation 
emphasizes issues of program implementation, while outcomes 
evaluation is concerned with the impact of those services. A 
process evaluation of student counseling services, for example, 
might assess whether those students most needing counseling 
were in fact receiving the service and how much (in terms, say, 
of contact hours) the> were receiving. An outcome evaluation 
of the same service, on the other hand, might assess the extent 
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to which students' psychological or academic difficulties were 
resolved or ameliorated as a result of counseling. Similarly, a 
process evaluation of the college curriculum might focus on the 
types and numbers of courses taken by various students, while 
an outcome evaluation would look at the effect of these courses 
on students' cognitive skills, success at work, and so forth. 

A related distinction is that between formative and summa- 
tive evaluations (Sylvia, Meier, and Gunn 1985). Formative 
evaluations are similar to process evaluations in that they are 
conducted in the earlier stages of service delivery to help staff 
and managers be more effective in their work. Sunimative eval- 
uations, on the other hand, are more like outcome evaluations 
in that they focus on the final impacts of the program and are 
more frequently used for decisions regarding future allocation 
of resources. 

Within the framework of evaluation research, student out- 
come measurements are more relevant to outcome and sunima- 
tive evaluations than to process and formative evaluations. 
Process evaluations would address whether the various courses 
and programs were reaching the students for whom they were 
intended, whereas outcomes evaluations would determine 
whether the courses and programs were influencing student de- 
velopment in a manner congruent with institutional goals. Both 
types of evaluation information, however, could be useful to 
administrators facing difficult decisions about allocation of re- 
sources or to program directors seeking to develop their pro- 
grams in a competitive environment. 



Student services. In addition to assessing academic programs, 
information about outcomes can be used to improve the quality 
of student services. Information about student outcomes can be 
applied to counseling, orientation, placement, and other student 
personnel functions to inwiease the fit between students' needs 
and a program's impact. Within this perspective, data about 
outcomes are likely to be used on the individual rather than on 
the aggregate level. Placement tests, achievement tests, and as- 
sessments of general education represent outcomes data that can 
be applied to service delivery (Ewell 1983). For example, im- 
provements shown in standardized test scores at the end of a 
student's freshman year could provide useful information to the 
student and his or her academic counselor about the student's 
academic needs and strengths. This information might also 
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help identify students at risk for attrition to circumvent that 
possibility. 



Setting goals. Assessments for accountability and evaluation as 
described here assume that the institution has established a set 
of goals and that it needs only to determine its success in meet- 
ing them. But what happens when the institution is not sure of 
its goals or wishes to reconsider and perhaps change them? Un- 
der these conditions, an outcomes assessment may be appropri- 
ate as well, not to evaluate progress toward some a priori set f 
objectives but rather to facilitate reflection upon what the 
school currently provides to students and how it might be im- 
proved. 

When ^sed in setting goals, outcomes assessments might fo- 
cus on a broad array of student behavior, cognition, and affect 
and might make special efforts to discern unexpected outcomes 
(side effects) and negative outcomes. Qualitative approaches in- 
volving, for example, open-ended interviews with students and 
other constituents may prove richer and more stimulating than 
the traditional quantitati\'e approaches to outcomes assessment. 
Obviously, as setting goals is ultimately a question of values, a 
student outcomes assessment will not in itself indicate what the 
school's goals ought to be. Rather, the assessment may sen'e 
as a starting point for discussion and reflection among students, 
faculty, administrators, alumni, and others about what students 
need to learn in college and about how the institution might 
best contribute to students' development. If nothing else, out- 
comes assessment forces us to make our implicit values and 
goals more explicit. And the mere process of trying to define 
these goals can often serve to help clarify them. 

Strategic planning. Long-range, strategic planning is increas- 
ing within higher education as both a response to external de- 
mands for accountability and as a proactive effort to provide a 
rational basis for decision making in light of an uncertain future 
and rapidly changing external environment. 

hi effect, straiegic planning examines the big issues-^tlie or- 
ganization*s pu?j)ose, its ??ussio?i, its relationship to its envi- 
ronment, its share of the market, its interactions with other 
Ofganizations, Strategic planning Is not toncenicd with nuts- 
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ancUbolts issues .... //// asks the basic questions of institu- 
tional health and survival (Baldridgc 1983, p. 175). 

Strategic plans have five benefits: (1) to establish an organiza- 
tional framework, (2) to guide delegation of responsibility and 
allocation of resources, (3) to help motivate people, (4) to 
ser\'e as channels of communication, and (5) to provide a basis 
for control (or accountability) (Allen 1982). 

Outcomes assessment contributes to institutional strategic 
planning a! several stages. First, as discussed earlier, informa- 
tion about outcomes can assist faculty and managers within the 
institution in defining their goals and objectives. Similarly, data 
about outcomes can also point to critical issues that must be 
resolved for the institution to successfully achieve its goals. 
Third, outcomes assessments are a source of baseline data so 
that both student services personnel and faculty can develop 
programs, policies, and curricula that respond appropriately to 
students' needs and abilities. Finally, outcomes assessments 
provide essential feedback about the effectiveness of long-range 
plans and point to areas where plans must be modified to 
achieve institutional goals. 

Assessment is essential in the early phases of strategic plan- 
ning, necessary for the institution to identify strengths, weak- 
nesses, and opportunities for the future (Sylvia, Meier, and 
Gunn 1985). Assessn^cnt for evaluative purposes is also the last 
stage of a strategic planning process (cf. Munitz and Wright 
1980). 

Oihcr uses for information about outcomes 
Marketing. An increasingly common reason for conducting 
outcomes assessments is to generate information that can be 
used to increase prospective students' awareness and under- 
standing of the institution. In this manner, outcomes assess- 
ments become a marketing tool— a way of communicating with 
the community. Colleges that are trying to attract a larger pool 
of applicants (or trying to increase the quality or diversity of 
their applicants) may wish to inform selected subpopulations of 
prospect e students about the likely outcomes of attending that 
school. The college may also want to educate community mem- 
bers about student outcomes to increase the congruence be- 
tween community perceptions of the institution and the actual 
benefits delivered by the school. 



A social niarkciing perspective holds lhal the main mission 
of ihe organization is lo respond appropriately lo the needs and 
wants of its target markets (Kotler 1982). Within this approach, 
outcomes assessments become a tool to determine the institu- 
tion's effectiveness in meeting the goals of the community (or 
other target markets). For example, if certain local employers 
constitute one target market, the outcomes assessment might fo- 
cus on those employers* ratings of the work skills of recent 
alumni. If the graduates of a particular high school represent 
another target market, tl^ outcomes assessment might focus on 
the qualities most valued by those graduates (income after grad- 
uation, admissions to graduate or professional schools, employ- 
ment opportunities for graduates, for example). 

As competition for resources and students increases, so will 
strategic marketing by colleges and universities. Outcomes as- 
sessments may provide information that c:<n be used to increase 
community awareness of a school, improve community atti- 
tudes, and facilitate better communication between the school 
and its target markets. 



Basic research. Assessing the effects of college on its students 
is an important area of academic inquiry, even when removed 
from issues of immediate cpplication lo policy and manage- 
ment. Within the academic context, the broad area of student 
outcomes can be addressed from multiple levels of analysis. At 
the collegiate level, the researcher might explore cognitive de- 
velopment, social development, character development and per- 
sonal growth, attitudes, values, and so forth. A broader 
analysis might explore the impact of college on the family, pal- 
terns of socialization, or quality of work. An even broader 
level of analysis might examine the impact of college on the 
economy, the political structure, and the culture. Academic re- 
searchers may also obtain a better understanding of the nature 
of colleges and universities as complex organizations by a com- 
parison of student outcomes across diverse educational environ- 
ments. 

Much of the published literature on student outcomes is con- 
cerned v%^ith institutional impacts, describing, for example, the 
manner in which a variety of personality traits and attitudes 
change during the college years (Bowen 1980: Feldman and 
Newcomb 1969). Such information is probably not perceived as 
especially helpful by administrators struggling to allocate re- 
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sources, define policy, or develop programs to facilitate the de- 
velopment of students. Nonetheless, such information does 
provide a backdrop against which to interpret observed out- 
comes within a single institution at one time. The academic 
perspective also makes us step back from day-to-day decisions 
to observe some major impacts of a college education that 
might otherwise go unnoticed. In this manner, data without im- 
mediate application! .nay prove useful over time. 



Problems in the Use of Data about Outcomes 
Although outcomes information can contribute to both account- 
ability assessments and institutional self-improvement, many in- 
stitutional researchers have found that their reports on outcomes 
only collect dust. Despite their potential as useful management 
tools, the data are often discounted or ignored. The assessment 
of student outcomes can in no way be cost effective if man- 
agers, faculty, or other practitioners do not use the results. Ob- 
stacles to use come in four broad categories. 

First, outcomes assessments may fail to live up to their po- 
tential as management tools as a result of inadequate concep- 
tualization. A careful consideration of the purposes of 
assessment is essential if research methods and procedures are 
to be matched to specified goals or expectations. For example, 
a project that is intended to facilitate reflection upon institu- 
tional goals or curriculum by faculty members may look quite 
different from one that is intended to satisfy concerns of a state 
government about accountability. The objectives of the out- 
comes assessment will influence decisions regarding methodol- 
ogy, instrumentation, analysis, report fomiat, and 
dissemination. The successful project will be based on a set of 
objectives that is clearly delineated and shared by researchers 
and decision makers. 

A second reason for underuse of information about outcomes 
is technical barriers. Methodology that fails to eliminate major 
competing hypotheses, instruments that lack established reli- 
ability or validity, errors in analysis, and so forth significantly 
reduce the ability of an assessment to accurately and unambigu- 
ously point to major outcomes. For example, many outcomes 
projects use cross-sectional rather than longitudinal designs, 
and others neglect to include comparison groups. Another more 
common deficiency is the failure to include ^^environmental" 
information about the students' educational experiences (Astin 



and Ayala 1987). Such common approaches are technically un- 
suited to determining the effects of college experiences on stu- 
dents' development. 

Third, outcomes research is neglected or discounted as the 
result of political barriers. Outcomes research is one of many 
pieces of information available to practitioners about institu- 
tional performance. Many other sources of data are available to 
administrators, including subjective impressions, informal inter- 
actions, committee reports and recommendations, reports by 
external agencies, and institutional ratings or reputation (Weiss 
1988). Thus, research data must compete with many other 
sources of information to influence policy decisions. 

Further, many postsecondary institutions are highly conserva- 
tive and faculty or administrators may be invested in maintain- 
ing the status quo. Under such circumstances, resistance is 
mobilized when change is recommended, and information about 
outcomes may become a victim of academic gamesmanship 
(Astin 1976). 

Political barriers often masquerade as technical barriers to 
use. For example, practitioners who find that empirical findings 
threaten the status quo may choose to criticize research method- 
ology rather than take issue directly with the research findings. 
This event is particularly likely when faculty members are 
asked to play an active role in applying data about outcomes to 
modifications in program or policy. The situation can i'so be 
reversed so that technical barriers may appear at first glance to 
be political barriers. For example, a poorly written research re- 
port may discourage active consideration by practitioners, or an 
inappropriate analysis may produce data that are irrelevant to 
institutional issues and therefore ignored. 

Finally, outcomes research will be underused if it is commis- 
sioned to indicate the "best" outcomes or directions for the in- 
stitution. Like all empirical research, outcomes assessments 
cannot indicate what a school's goals should be (Baird 1976; 
Bowen 1974b). Although outcomes research can provide an ac- 
curate description of how students change in response to col- 
lege, the value attached to these changes is ultimately 
subjective and cannot be empirically determined (Astin 1970). 
Facilitation of one outcome may mean that another is over- 
looked; outcomes research cannot indicate which tradeoffs are 
most appropriate for a given institution. Nor can it tell if the 
costs of providing certain student services or educational pro- 
grams are justified by the value of the outcomes they facilitate. 
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Scope of the Analysis 

Information about student outcomes can play a critical role in 
institutional planning and policy development; however, the 
measurement of student outcomes poses numerous technical 
and political challenges. Additional challenges are incurred in 
designing assessments that can be applied to institutional man- 
agement and decision making. 

The rewards of well-planned studen outcomes assessments 
justify their cost, however. The goal of this monograph is to 
increase the usefulness of research on outcomes by offering so- 
lutions to some of the challenges practitioners frequently en- 
counter in gathering information about outcomes or in 
conducting research about outcomes. Issues of measurement are 
emphasized, as little information is currently available about 
this critical component of research about outcomes. The follow- 
ing sections review available instruments for the measurement 
of student outcomes, offer solutions to some methodological 
problems, and discuss the relationship between measurement 
and use of information about outcomes. 

Selection of measurement methods and instruments is always 
based on some implicit or explicit theory of student outcomes. 
When underlying beliefs are unexamined and implicit, selected 
measures may ultimately prove inappropriate for institutional 
goals and policy making. Reflection and discussion about dif- 
ferent concepts of student outcomes, in contrast, will increase 
the likelihood that subsequent research will be useful to admin- 
istrators. 

The following sections discuss three broad areas of concern 
in conducting useful assessments of outcome: (1) philosophical 
and conceptual issues, (2) measurement issues, and (3) contex- 
tual issues related to the integration of research into institu- 
tional decision making. The next section describes a philosophy 
of institutional excellence and effective performance called 
"talent development" (Astin 1985) and suggests that talent de- 
velopment provides a useful framework to plan, administer, in- 
terpret, and apply information about student outcomes. The 
third section provides a more concrete discussion of conceptual 
issues by reviewing outcomes taxonomies that may guide insti- 
tutions in identifying critical student outcomes. After determin- 
ing the factors to assess, the selection of appropriate 
measurement toois poses many challenges. The fourth section 
offers a general discussion of issues to consider in the selection 
or design of measurement instruments, and the fifth reviews 
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over 25 cognitive assessment instruments that may be consid- 
ered for use within a talent development perspective. The sixth 
and seventh sections focus on contextual issues, with the sixth 
providing a review of evaluation literature related to the use of 
research findings and the seventh offering a number of practical 
suggestions to help institutions get started in assessment from a 
talent development perspective. 

The following sections, and especially the fourth and fifth 
ones, tend to emphasize cognitive rather than affective out- 
comes. In part, the focus on cognitive outcomes is to fill a gap 
in the literature. A considerable volume of research, extending 
over two decades, addresses affective outcomes of higher edu- 
cation (see, for example, Astin 1977; Fcldman and Newcomb 
1969; Pace 1979). Cognitive outcomes, however, have not re- 
ceived this attention in the literature. Thus, the assessment of 
cognitive outcomes, which is perhaps the most difficult task as- 
sociated with assessment, deserves extra attention and visi- 
bility. 

A secondary reason for this focus is that cognitive outcomes 
are central to the mission of higher education and increasinglj 
a concern of the educational reform movement. 

Assessment of undei-gmduaie leaniing ami college qiuilin 
needs, at minimum, to include data about student skills, abil- 
ities, and cognitive leaniing; substantive knowledge of indi- 
vidual students and gwiips of students at various points in 
their undergraduate careers; instmctional approaches used 
byfaculiy; and educational cuiriciila (National Governors 
Association 1986, p. 156). 

Thus, increasing numbers of practitioners and administrators 
are likely to face both external and internal demands for infor- 
mation about cognitive outcomes. Many of the conceptual and 
empirical issues discussed in this monograph, however, are 
likely as applicable to affective as well as to cognitive out- 
comes and will be helpful to those readers with interests in a 
broad array of outcomes. 
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A PHILOSOPHY OF ASSESSMENT 



Any attempt to implement an institutional program of assessing 
student outcomes should be based on some coherent philosophy 
of institutional mission. In particular, the assessment program 
should reflect some conception of what constitutes effective 
performance of that mission. Effective performance, of course, 
is closely allied to concepts of institutional quality or excel- 
lence. Tliis section first discusses the authors' conception of in- 
stitutional quality or excellence and4hen suggests some 
theoretical and philosophical perspectives that might be applied 
in developing a program of institutional outcomes assessment. 

Wliat Is "Excellence"? 

"Excellence" and "quality" are perhaps the most fashionable 
terms in discussions of education these days. But even though 
many of us are fond of talking about excellence, we seldom 
take the trouble to define what we mean by excellence in the 
first place, which is not to say that there are no certain implied 
definitions underlying many of the time-honored practices of 
institutional assessment. What we have failed to do is to make 
these definitions more explicit and to examine them critically. 

The two most commonly used approaches to defining excel- 
lence can be labeled as the reputational and resource ap- 
proaches (Astin 1985). The reputational view holds that 
excellence is equated with an institution's rank in the prestige 
pecking order of the institution as revealed, for example, in pe- 
riodic national surveys. The resource approach holds that excel- 
lence is equated with such criteria as test scores of entering 
freshmen, the endowment, the physical plant, the scholarly pro- 
ductivity of the faculty, and so on. These approaches are mu- 
tually reinforcing in the sense that enhanced reputation can 
bring an institution additional resources, and additional re- 
sources like highly able students and a nationally visible faculty 
can enhance an institution's reputation. 

Perhaps the major limitation of these traditional approaches 
is that they do not necessarily reflect higher education's most 
fundamental purpose: The education of students. If one accepts 
the idea that higher education's principal reason for being is to 
develop the talents of students— or, as the economists would 
say, to develop the "human capital" of the nation— then 
"quality" or "excellence" should reflect educational effective- 
ness rather than mere reputation or resources. This alternative 
conception of excellence can be labeled the "talent develop- 
ment" view (Astin 1985). The talent development view, then. 
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holds that a high-quality institution is one that maximizes the 
intellectual and personal development of its students. 

These alternative views have important implications for insti- 
tutional assessment. Under the reputational and resource ap- 
proaches, attention is focused on the caliber oi entering 
students as reflected in standardized admissions test scores and 
high school grade averages. High-achieving students are thus 
viewed as an important institutional ''resource" that also tends 
to enhance the institution's reputation. Under a talent develop- 
ment tVpiOacli, on the other hand, assessment focuses more on 
changes or nnprovements in students' performance from entry 
to exit. 

hi actual practice, the talent development approach might be 
applied to an individual campus somewhat as follows. Newly 
admitted students would be tested to determine their entering 
lc\el of competence for purposes of counseling and placement. 
These initial scores would be useful not only in providing in- 
formation about a student's specific strengths and weaknesses 
but also in c...ablishing a baseline against which to measure that 
student's subsequent progress. After the student completes a 
course of study, the same or similar assessments are repeated 
and the differences in performance used in providing critical ip 
formation about the student's growth and development— not 
only to the student but also to the professor and institution. 

The talent development approach does not depend on the use 
of any particular method of assessment. Objective tests and es- 
says, interviews, departmental examinations, work samples, 
performance examinations, and any other devices might be ap- 
propriate, depending on the content and objectives of the cur- 
riculum or program being assessed. 

A Theory of Educational Practice 

How can talent development assessment be used to improve ed- 
ucational practices? To answer this question, it is first neces- 
sary to outline at least the basic elements of the authors' 
conception of how administrators and faculty members operate 
and how students learn and develop. 

The educational practitioner is a kind of ''performing artist" 
(Astin and Scherrei 1980). Following this analogy, it is impor- 
tant to realize that an essential ingredient in any performing art- 
ist's development of technique and skills is the opportunity to 
view the results of his or her work. Neophyte painters ,sec what 



comes oul on ihc canvas, and aspiring musicians hear whal 
ihey play or sing—and ihey adjust iheir beiiavior accordingly. 

If adminislraiors and faculty members lr>' lo enhance ihe slu- 
dcnl's lalcnl dcvelopmenl as a means of gauging ihe effective- 
ness of their efforts, it seems that few of these practitioners 
ever receive appropriate feedback about the results of their 
practices. They are like artists learning to paint blindfolded or 
musicians learning to play the violin with their ears phigged. 

While it is true that college faculty members, as they prac- 
tice the "performing art" of teaching and learning, receive 
some informal feedback from their students, this input rarely 
provides any systematic information about how much and how 
well students are actually learning. Professors might argi o that 
their final examinations allow them to evaluate the qual'cy of 
learning, but in many respects, relying on final examinations is 
like closing the barn door after the horse has escaped. Indeed, 
performance on final examinations is very difficult to evaluate 
without some clear notion as to how well students were per- 
forming at the beginning of the course. As for advising, profes- 
sors rarely have the opportunity to learn about their success and 
failures in this important enterprise. 

The analogy of performing artist can be extended to support 
staff as well. Many areas of institutional functioning affect stu- 
dents directly: registration, orientation, financial aid, housing, 
food services, parking, social activities, career counseling, per- 
sonal counseling, extracurricular activities, health services, job 
placement. How can the personnel responsible for these diverse 
student services improve their programs and policies unless 
they solicit systematic evaluations of their efforts from the stu- 
dents they serve? 

Whal kinds of information about student development are 
most likely to be of use to faculty and administrators? If these 
practitioners are to develop effective short- and long-term strat- 
egies for their colleges and students, they must have a theory 
of how students learn, of what facilitates or inhibits students' 
educational development. While each institution must develop 
its own theory, some theory is a critical ingredient in designing 
a truly effective assessment program. The authors' preference is 
for a theory of student development that has evolved from sev- 
eral major studies of institutional impact on student develop- 
ment (see Astin 1975, 1977, 1985). A principal concept in this 
theory is that ol student involvement y the time and the physical 
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and psychological energy that the student invests in the aca- 
demic experience. The more students are involved in the aca- 
demic experience, the greater their learning and growth and the 
more fully their talents arc likely to develop. The less they are 
involved, the less they learn and the greater the chances they 
will become dissatisfied and drop out. Under these circum- 
stances, talent development is obviously minimized. A recent 
report. Involvement in Leanwig: Realizing the Poieniial of 
American Higher Education (Study Group 1984), embraces the 
involvement theory. The concept of involvement suggests, 
among other things, that any assessment program should at- 
tempt to determine how much time and energy students actually 
invest in their educational experience. 



OUTCOME TAXONOMIES 



This monograph offers a broad definition of student outcomes 
as "the wide range of phenomena that can be influenced by the 
educational experience." While such a definition has the ad- 
vantage of allowing practitioners to interpret talent development 
assessments in the manner that best fits their needs, it leaves a 
number of questions unanswered. For example, what behaviors, 
cognitions, and attitudes is the educational program designed to 
enhance? Can we observe outcomes of college while the col- 
lege experience is still unfolding (that is, while students are 
still enrolled), or must we wait until many years after gradua- 
tion? Should outcomes be limited to the effects of the formal 
educational program, or should we also examine the often ser- 
endipitous results of informal experiences? Is it appropriate to 
limit our assessments to the planned or expected effects of a 
program, or should we also examine possible unintended *'side 
effects"? 

The authors' definition should also be viewed in light of 
whether outcomes assessment is an exercise in description or in 
explanation. Research on outcomes can attempt to establish 
causal relationships between the college environment and ob- 
served student outcomes, or it can merely document students' 
performance at particular points in time. By focusing on out- 
comes that can be influenced by the educational programs, the 
authors' definition clearly reflects a concern with the impact of 
the college environment on students. 

In implementing a talent development philosophy and assess- 
ment program, faculty, staff, and managers must carefully con- 
sider the outcomes of most importance to the mission and goals 
of the institution. Efforts to identify appropriate outcome mea- 
sures can be aided by a variety of outcome taxonomies. Per- 
haps the most important contribution of such taxonomies to 
implementation of a talent development approach is to support 
institutional dialogue about the outcomes of most importance to 
a college or university. From this perspective, taxonomies pro- 
vide a menu from which researchers and practitioners may se- 
lect the items of greatest importance to measure and track. 

This chapter describes four different outcome taxonomies, 
each of which has been widely used in institutional planning 
and research. Three of them (Lenning, Bowen, and Astin) were 
developed from relatively global or broad perspectives, provid- 
ing a comprehensive set of potential outcomes. The fourth was 
developed by faculty, institutional researchers, and administra- 
tors in response to the goals and mission of a particular institu- 
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lion, Alvcrno College. Because these taxonomies differ in 
content, organization, and breadth, they are best viewed as 
complementary rather than competing schemas. 

Lenning 

Lenning and associates (1977, 1980) present a highly refined 
and detailed taxonomy of outcomes. In traditional taxonomic 
st>le, they offer several major headings, each of which includes 
various levels and types of outcomes. Major categories of out- 
comes include, first, economic outcomes^ including students' 
access to resources, accumulation of resources, production, an(' 
so forth. Economic resource outcomes emphasize the contribu- 
tion of higher education to an individual's future income, earn- 
ing ability, and productivity. A second category Lenning 
propo:)es is human chamcteristics outcomes. This somewhat 
generalized phrase subsumes such outcomes as aspirations, 
cL'inpctence and skills, morale, personality, physical/physiologi- 
cal cIk tcteristics, social activities, and social status and recog- 
nition. The third category, knowledge, technology', and art 
fonn funaion.s, includes those outcomes most directly linked to 
substantive elements of college education, such as general and 
specialized knowledge, research and scholarship products, and 
art works. Resource and scnicc provision outcomes, the fourth 
categor> , includes the provision of facilities, events, and seiv- 
iccs. The fiUitl category' compiiscs aa^thetic and cultural activi- 
ties as well as the organization and operation of the institution. 

Lenning's typology, which was derived from a content 
analysis of the literature un outcomes, is most distinctive for its 
comprehensi\'e detail. (In fact, his typology is not restricted 
onl> to student outcomes, and the last two categories described 
in the preceding paragraph are focused on the organizational or 
community level of analysis.) Lcnning's iipproach is most con- 
gruent with a management perspectiv e, as tlic typology deline- 
ates a range of outcomes that can serve as evaluation criteria 
and guide decision makers in allocating lesources (Ewell 1983). 

The broad range of outcomes Lenning describes may suggest 
to resCiirchcrs that an outcomes assessment should include an 
equally broad range of dependent variables. While this ap- 
proach may be appropriate under certain circumstances, the 
most useful assessments will be based on outcomes that have 
been carefully selected for their relevance to institutional goals 
and policy questions. 
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Bowcit 

Like Lenning, Bowen (1980) offers a taxononiic system that is 
based on a review and content analysis of the literature on stu- 
dent outcomes and includes outcomes at levels of analysis other 
than the individual student. In contrast to Lenning, however, 
Bowen tics his typology to goals that many institutions hold for 
their students. In fact, he offers a catalog of goals rather than 
outcomes and then uses this catalog to organize his review of 
the literature on student outcomes. This organizational system 
may be directly translated into research objectives, as the selec- 
tion of dependent variables is clearly linked to institutional 
goals. 

Bowen^s five main categories are cognitive leamlngy emo- 
tional and moral development, practical competence, direct 
satisfactions from college, and tho avoidance of negative out- 
comes. The content of Bowen's schema differs from Lenning's 
in several ways. First, Bowen includes a more detailed list of 
outcomes of practical competence, while Lenning includes 
more outcomes involving human characteristics. Second, 
Bowen emphasizes the avoidance of negative outcomes, which 
can add an additional dimension to assessments of outcome 
(similar to side effects in medical research). Third, Bowen in- 
cludes students' satisfaction with college as a m» or classifica- 
tion of outcomes. 



Astin 

Like Bowen's, Astin's taxonomy (1974, 1977) is driven by a 
consideration of the goals of higher education, which includes 
faculty development and community services as well as student 
outcomes. (This discussion, however, is limited to student out- 
comes.) Astin's taxonomy is more complex than Lcnning's ...•'d 
Bowen's in the sense that it includes three dimensions: type of 
outcome, type of data, and time. Further, Astin provides a tax- 
onomic system for measures of student outcomes, while Len- 
ning and Bowen classify outcome vanables. 
The type of outcome is divided into cognitive and affective: 

Cognitive measures have to do with behavior that requires 
the use of high-order mental processes, such as reasoning 
and logic .... Noncognitive, or affective, measures have to 
do with the student's attitudes, values, self-concept, aspira- 
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(ions, and social and inta-pcnonal relationships (Aslin 
1974, p. 30). 



Type of dala refers lo ihe manner in which each ouiconic is 
actually measured. *Tsychologicar' measures reflect ihe inter- 
nal slates of individuals, while ''behavioraP' measures refer lo 
iheir observable aciivilies. 

Aslin's ihird dimension is lime. Some outcomes of college 
are obsen/able after a brief period of time and may be measura- 
ble while the individual is still a student. Others may not be 
observable or measurable for many years. For example, stu- 
dents' knowledge of current research findings within their ma- 
jor field is a short-term outcome that can be measured after 
several semesters or classes. In contrast, students' ability to ef- 
fectively apply this knowledge in their chosen careers is a long- 
term outcome that cannot be assessed until after the student has 
held a career position for some time. 

Compared to Lcnning and Bowen, Astin provides \cs9 detail 
about specific student outcomes. Because Astin argues, how- 
ever, that a comprehensive outcomes assessment requires all 
eight types of data (2 x 2 x 2), his three-way matrix can pro- 
vide a basis for evaluating available outcome data. For exam- 
ple, one might discover that some in.stitutions collect data 
almost exclusively within one or two cells of the matrix and 
thereby obtain an incomplete picture of student outcomes. 
Other schools might have data available from all cells but 
might require more depth and detail within a single cell or bet- 
ter integration across cells. 

Mentkowski and Dolierty 

One distinguishing aspect of Mentkowski and Doherty*s taxo- 
nomic system (1983) is that it was collaboratively developed by 
faculty and administrators at Alverno College as an integral ele- 
ment of their efforts to implement a.* "outcome-centered liberal 
arts program." The other taxonomies described here were de- 
veloped as part of scholarly research rather than as part of insti- 
tutional management and decision making. 

In response to increasing concerns about institutional ac- 
countability and changing needs of students, Alverno College 
decided to implement an outcome-centered liberal arts program 
in 1973. The faculty was asked to identify broad educational 
goals and to suggest how those goals could be defined, as- 
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scsscd, and validated. Students' progress toward the goals was 
measured at several points, both during and after college. 

Faculty identified eight outcomes for assessment that re- 
flected their views about the goals of the liberal arts program: 
communications, analysis, problem solvingy valuingy social in- 
tcraction, taking responsibility for the environment, involve- 
ment in the contemporaiy worlds and aesthetic response. This 
broad taxonomy of outcomes became the basis for student as- 
sessments and evaluations of educational effectiveness. 

Unlike the taxonomies previously presented, the Alverno tax- 
onomy was developed in concert with a reconccptualization of 
the institution's goals. Lcnning's, Bowcn's, and Astin's taxon- 
omies, in contrast, were derived from an analysis of the litera- 
ture on student outcomes. Tlie Alverno outcomes, however, 
cluster heavily in Lcnning's "human characteristics" categor>'. 
They appear to be somewhat more dispersed across Bowen's 
categories, covering ''cognitive learning," ''emotional and 
moral development," and "practical competence." Viewed 
from Astin's perspective, the Alverno model includes both af- 
fective and cognitive outcomes, both behavioral and psycholog- 
ical data, and assessments conducted at several points in time. 

The advantage of the Alverno taxonomy is that it is highly 
congruent with the goals of the institution. Because the taxon- 
omy was developed internally, key decision makers perceived it 
as valid and relevant. As a result, program evaluations and out- 
comes assessments derived from the taxonomy have become in- 
tegral aspects of institutional management. One political 
disadvantage of fnis taxonomy is that it is restricted to those 
outcomes viewed as most important by the community. Conse- 
quently, research based solely on these eight categories may 
overlook outcomes that are potentially important from alterna- 
tive perspectives. 

The following discussion emphasizes cognitive outcomes of 
postsecondary education. Cognitive outcomes arc typically per- 
ceived as the most important college outcomes and most related 
to primary goals of the institution. A broad range of constitu- 
ents and decision makers within the institution share a concern 
whh students' cognitive development as a result of their college 
education. Therefore, cognitive outcome assessments are most 
likely to gain acceptance from institutional leaders. A second 
reason for the emphasis on cognitive outcomes is that those 
who argue for greater "accountability" in higher education typ- 
ically have cognitive outcomes in mind. 
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The asscssmcn! of cognitive outcomes of college is a chal- 
lenging task. The following sections consider in depth both the 
technical and political problems such projects may encounter 
nnd offer guidelines to the solution of such problems. 
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ISSUES OF MEASUREMENT IN TALENT 
DEVELOPMENT ASSESSMENT 



An institution embarking on a talent development assessment 
must face the challenge of selecting or devising appropriate as- 
sessment instruments. This section discusses some broad meth- 
odological issues to consider in selecting an appropriate 
measurement instrument. 

The discussion of measurement issues focuses on the assess- 
ment of cognitive outcomes. While a talent development ap- 
proach can (and should) include affective as well as cognitive 
outcomes, the measurement of cognitive outcomes is especially 
difficult. Researchers intere^^'^j in assessing affective outcomes 
can choose between several widely used instruments (for exam- 
ple, the Cooperative Institutional Research Program Freshman 
and Follow-up Surveys, the College Student Experiences Ques- 
tionnaire developed by C. Robert Pace), but researchers inter- 
ested in assessing cognitive outcomes will encounter considerable 
confusion about the appropriate instruments. Focusing the discus- 
sion on cognitive instruments has the aim of reducing this confu- 
sion, with the added hope that the issues discussed may prov 
helpful in planning affective, attitudinal, and behavioral uutLonies 
assessments as well. 

Finding the Instrument to Fit the Institution's Needs 
Talent development assessments may be conducted with stan- 
dardized assessment instruments, commercially available from 
testing organizations, or with locally designed instruments de- 
veloped by faculty and institutional researchers on campus. 
Standardized assessment instruments offer the user several ad- 
vantages relative to instruments designed within the institution. 
First, these measures generally have established reliability and 
validity. Second, comparative and normative data based on na- 
tional samples of students are often available and can be useful 
in the interpretation of test results. Finally, such instruments 
are usually more efficient to administer and score, given the 
established procedures and support services provided by the 
vendors. 

Even with these advantages, institutions often find estab- 
lished instruments unsuited to their needs. Of primary concern 
is the fit between how testing organizations and institutional 
personnel define key concepts for assessment. Concepts such as 
analytical abilities, problem solving, critical thinking, and writ- 
ing ability are subject to a wide range of interpretation. If the 
institution's leaders (be they academic officers or faculty re- 
searchers) and the test vendors differ in their conceptual defini- 
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tions and interpretations, the established instrument is unlikely 
to be useful to the institution, despite its technical strengths. 

An even greater danger is using standardized instruments to 
avoid internal efforts to clarify key concepts or to define goal 
statements. By accepting without reflection a concept or defini- 
tion offered by a vendor, an institution loses (or at best defers) 
an important opportunity to reflect on educational goals and ob- 
jectives. As a result, both practitioners and researchers are 
likely to find that the information collected fails to inform eval- 
uation or program and curriculum development. According to 
Ernest L. Boyer, "Any college that has not thought carefully 
about goals should not even open the issue of collegewide as- 
sessment" {Chronicle of Higher Education 15 October 1986, 
p. 41). To this might be added that any college that has not 
thought carefully about the operational definitions of its high- 
priority outcomes is not ready to select and administer stan- 
dardized assessments. 

Several researchers and practitioners have argued that stan- 
dardized tests that measure meaningful outcomes of higher edu- 
cation are simply unavailable. 

Thus, there are standardized tests available that seek to mea- 
sure achievement in both general and specialized education. 
But for the most part, the tests are, we believe, re- 
stricted .... Colleges nm the risk of measuring that which 
matters least (Boyer 1987, p. 256). 

Standardized tests focus on minimum competence rather than 
advanced knowledge and emphasize specialized knowledge 
over more abstract but more important outcomes (Edgarton 
1987). Similarly, 

Wlien the objectives for a general education curriculum are 
compared with the content of the commercial tests available, 
it is apparent that none of the tests measure more than half 
of the broad understanding most faculty members believe gen- 
eral education should impart (Banta and Fisher 1987, p. 45). 

Recent innovative approaches to test design, however, indicate 
a growing interest among vendors and researchers in the devel- 
opment ci instruments that respond to institutional needs. For 
example, the ETS Academic Profile, now being pilot tested, is 
designed to measure students' skills within broad academic 



areas, not specialized knowledge. Other efforts to design stan- 
dardized writing assessments are useful, fair, and affordable 
(Quellmalz 1984). 

Even when an instrument does appear to match the needs of 
the institution, difficulties may arise in relationships with ven- 
dors. Institutional researchers should discuss their goals for as- 
sessment and research design with vendors so that they can 
determine in advance whether vendors will provide support for 
talent development. When vendors resist applications of exist- 
ing instruments for talent development (for example, by provid- 
ing only total scores rather than item or subscale scores or by 
providing only relativistic rather than absolute scores), groups 
of institutions making joint appeals are likely to be more effec- 
tive than individual requests for accommodation. 

In contrast, locally developed assessment instruments can re- 
spond directly to institutional goals and priorities. They provide 
an opportunity to involve faculty and managers in a collabora- 
tive effort to reflect upon and define key educational objec- 
tives. As a result, a number of outcomes r -searchers strongly 
support the use of locally developed instruments. 



In general, then, if one wishes to reach a specific decision, 
it is better to use a locally devised questionnaire concerning 
specific local conditions or to adapt an instrument from an- 
other institution than to use a ^device developed for a broad, 
national market that can focus only on general questions. 
One may lose national comparative information, but one in- 
creases the direct applicability of results (Baird 1976, p. 17). 



Nonetheless, locally developed assessments have several dis- 
advantages. First, they are expensive and time consuming to 
develop. Second, they may lack established test-retest reliabil- 
ity, internal consistency, and validity, therefore yielding results 
of questionable accuracy. Third, comparative data from other 
institutions are rarely available for locally developed instru- 
ments, and longitudinal data providing trends over time may be 
similarly unavailable. The absence of cross-sectional or longitu- 
dinal comparisons may limit one's ability to clearly interpet thr 
data collected and develop action recommendations on the basi » 
of the findings. 

To minimize the tradeoffs between standardized and locally 
developed instruments, institutions should consider using both 
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approaches in combination, thereby providing muUiple mea- 
sures of key outcomes. Although any single instrument may be 
insufficient to assess key outcomes when used alone, standard- 
ized tests can significantly contribute to an understanding of 
students' learning, especially when used in combination with 
other instruments and approaches. Institutions with vigorous 
value-added assessment programs in place, such as Alverno 
College and Northeast Missouri State University, tend to use a 
combination of standardized tests and surveys as well as locally 
designed assessment tools. 



Methodologica! Issues for Consideration in 
Selecting Assessment Instruments 

The review of cognitive assessment instruments indicates four 
recurring methodological issues that inluence the suitability of 
quantitative instruments for talent development purposes. Con- 
sideration of these issues is essential in selecting or developing 
assessment tools. 

First is the likelihood that students will bottom out on the 
pretest or top out on the posttest. If a test is too hard or too 
easy for a group, researchers will lose the ability to make valid 
cross-sectional and longitudinal comparisons. For example, al- 
though an incoming freshman cohort may demonstrate a range 
of scores on many basic skills tests, graduating seniors may 
tend to top out, limiting the worth of a talent development as- 
sessment approach. On the other hand, some subject matter 
competency tests may prove of such difficulty to freshmen that 
their scores would show little variance. 

The psychological effects of bottoming ou! on a pretest also 
deserve consideration. An unanticipated effect of pretesting un- 
der these conditions may be anxiety, discouragement, frustra- 
tion, and anger among students who have struggled for several 
hours with questions that are beyond their current capabilities. 
These negative effects may be particularly acute when such 
tests are administered to incoming freshmen, many of whom 
are already uncertain about their ability to succeed in the new, 
more demanding college environment. A related concern, 
called ''evaluation apprehension" (Cook and Campbell 1979) 
refers to the common desire to be evaluated favorably by re- 
searchers. Students' inability to achieve this goal may be dis- 
tressing and lead to increased levels of test anxiety in the 
future. 
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A second methodological factor for consideration in the se- 
lection of standardized tests is the provision of item scores as 
well as scaled and total scores. An iteni-by-item analysis most 
appropriately serves the diagnostic and evaluation purposes of a 
talent development approach. For example, an individual stu- 
dent may receive similar total or scaled scores on a pre- and a 
posttest; however, an item analysis could show that scores in 
one area increased significantly while those in another section 
decreased. In addition, item-by-item analyses provide an oppor- 
tunity to determine more precisely the level of knowledge or 
skill achieved by a cohort of students. Unfortunately, many 
commercially available instruments, such as the SAT and the 
GRE, do not provide item scores, limiting the usefulness of the 
test for talent development applications. 

The third methodological issue involves the validity of indi- 
vidual rather than aggregate scores. Several of the instruments 
to be reviewed, such as the ACT COMP Objective Test, pro- 
vide scores that are meaningful only for a cohort. While such 
aggregate analyses can be helpful in gauging the progress of 
groups of students, institutions also need pretest scores for the 
academic placement and diagnosis of individual students. In 
this case, users should carefully review measures for the reli- 
ability of individual scores. With aggregate scores, one must be 
aware of the potential threat to validity posed by significant at- 
trition from the sample assessed. In such instances, an appro- 
priate response would involve recalculation of pretest scores 
based on the sample completing the posttest. By being alert to 
such issues, an institution's talent development efforts will 
clearly benefit. 

A fourth issue is the need for absolute measures as well as 
relativistic measures. Test scores that reflect a student's per- 
formance relative to other students pose difficulties in longitu- 
dinal repeated-measures assessments designed to indicate 
students' development. For example, relativistic scores often 
mask improvements in a student's test performance, because 
the total cohort may show similar (or greater) increases. Selec- 
tive attrition from a sample can further reduce the usefulness of 
relative scores as indicators of talent development, as character- 
istics of the groups against which an individual's performance 
is evaluated or ranked may change significantly between pretest 
and posttest. 
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Confounding Factors in the Administration 
Of Pre- and Posttest Assessments 
A useful program for assessment adheres to established stan- 
dards of research design (see Astin 1970; Cook and Campbell 
1979; and Kerlinger 1973 for in-depth discussions of research 
design for outcomes assessment). It should be noted, however, 
that assessment findings are uninterpretable in the absence of 
comparison groups. What does it mean, for example, to find 
that students gained 60 points on a standard test of cognitive 
ability between their fresiiman and senior years? The data have 
more relevance when one compares, for example, social sci- 
ence to physical science majors or on-campus residents to com- 
muting students. Even under these conditions, however, the 
effects of the particular institutional environment on students' 
development cannot be discerned, because the study provides 
no variation on this dimension. For this reason, multi- 
institutional studies, despite the logistical problems they can 
present, are strongly recommended. 

The selection of a standardized instrument for talent develop- 
ment assessment must be guided not only by the manner in 
which the test defines key concepts but also by such test char- 
acteristics as internal reliability, test-retest reliability, and con- 
vergent and discriminant validity. Even when standardized 
instruments have established reliability and validity, however, 
the manner in which such instruments are administered in the 
field is of critical importance to the accuracy of the findi.jgs. A 
variety of threats can affect the internal, construct, and external 
validity of applied research (Cook and Campbell 1979). Threats 
to the validity of pre- and posttest assessments provide alterna- 
tive explanations for observed changes in students' scores, 
thereby raising the possibility that such changes are an artifact 
of uncontrolled factors rather than the result of the educational 
program. This section briefly describes the potential confounds 
of most relevance to outcomes assessment in higher education. 

History 

History can be a threat "when an observed effect might be due 
to an event that [takes] place between the pretest and the post- 
test, when this event is not the treatment of research interest" 
(Cook and Campbell 1979, p. 51). Outcomes assessments are 
particularly vulnerable to this threat, especially when a consid- 
erable period of lime elapses between pretest and posttest. For 
example, an international event that captures students' involve- 



ment or a summer tour to Europe for a group of students may 
lead to a gain in scores on political science measures that are 
independent of the effects of the curriculum. Alternatively, a 
major concert on campus attended by large numbers of students 
the night before **ie posttest may lead to tired students and de- 
pressed scores the following day. Under these circumstances, 
historical factors provide alternative explanations for an ob- 
served change from pretest to posttest. If these factors are not 
taken into account, practitioners may draw misleading conclu- 
sions from the data. 



Maturation 

When changes from pre- to posttest are potentially the effect of 
simple development rather than an educational intervention, 
maturation may be a confounding factor. Maturation is th; ma- 
jor confounding variable in value-added assessments (Pa;,carella 
1987). The possible solution to this confound of providing 
comparison groups of young adults who are not enrolled in col- 
lege presents both practical problems (securing compliance 
from such a group and finding the resources internally to sup- 
port this effort) and technical problems (because youth who do 
attend and who do not attend college differ in many ways). 
Comparing changes in scores of traditional 18- to 22-year-old 
students with those of older, returning students can also be a 
means of identifying the possible effectc of maturation (Pascar- 
ella 1987). 

Testing 

Multiple administrations of the same test may improve stu- 
dents' performance as a result of the effect of practice. For 
talent development assessments, this confound is likely when 
tests are administered within a relatively short time and/or at 
repeated points. Vendors that offer alternative, parallel forms 
of the same instrument provide one method of avoiding this 
confound. 



Instrumentation 

Instrumentation threatens validity when observed charges from 
pre- to posttest may be the result of a change in the i. measuring 
instrument, an especially likely possibility when the "measur- 
ing instrument" is human. For example, if a team of faculty 
members is asked to review students' essays to measure the de- 
velopment of \vriting abilities over time, findings may be con- 
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founded by systematic variations in the review process between 
the pre- and posttest (or even within a test session, if different 
reviewers are Inconsistent). Similarly, different styles of admin- 
istering the pre- and posttest (for example, providing extra time 
or helpful hints) may lead to instrumentation confounds. Using 
standardized, detailed criteria for the administration and scoring 
of tests, with frequent inter-rater reliabi.lty checks, is one way 
to reduce the threat of instrumentation. Another approach is to 
review both pretests and posttests at the same time or to reread 
a sample of pretests following administration of the posttests to 
determine whether grading criteria are being applied in a simi- 
lar manner. 

Instrumentation is also a potential threat if different forms of 
established instruments are not equivalent. Although most test- 
ing companies establish the equivalence of alternative forms of 
the same test according to a rigorous set of standards, review 
cf this methodology by faculty with expertise in testing and as- 
sessment can serve to reassure others within the institution that 
alternative test forms are indeed parallel. Instrumentation may 
also be a problem when test vendors regularly update tests and 
then retire older versions. Should such turnover occur between 
a pre- and a posttest, the equivalence of the examinations may 
be questionable. 

A related confounding factor, classified as a threat to con- 
struct validity (Cook and Campbell 1979) is the experimenter's 
expectancies. That is, researchers' expectations can become 
self-'^ulfilling prophecies. Within higher education, this phe- 
nomenon may be a pf -ticular concern when faculty are asked to 
rate students' development in those disciplines in which they 
teach. Under these conditions, the effects of the experimenter's 
expectancies can be reduced by procedures that ''blind" faculty 
to the characteristics of ihz student or test (for example, by not 
informing faculty if the examination under review is a pretest 
or a posttest or if the examinee is a freshman or a senior). 

Statistical regression 

Statistical regression is a threat when scores at extremes of a 
scale are unstable. It is of particular concern in talent develop- 
ment assessments when students are classified into groups on 
the basis of pretest scores. Statistical regression: 

(1) operates to increase obtained pretest-posticst gain scores 
among low pretest scores, ... 0 operates to decrease ob- 
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tained change scores among persons with high pretest 
scoivs . . . , and (3) does not affect obseived change scores 
among scoivrs at the center of the pretest distnbution (Cook 
and Campbell i'79, pp. 52-53). 

Tliis problem can be corrected by using residual gain scores de- 
rived from regression analysis instead of raw change scores 
(Astin 1970). Although statistical regression can be reduced by 
selecting instruments with high test-retest reliability, it should 
be of some concern in any assessment program that focuses on 
students with extremely high or extremely low pretest scores 
(of, Taylor 1985). 

Mortality 

The validity of pre- and posttest comparisons of two or more 
groups of students is reduced if different types or numbers of 
students tend to drop out of one group more (or less) than the 
othcr(s). Mortality is similar to the problem of selective attri- 
tion from the sample or cohort under investigation, dkgussed 
earlier. 



External validity 

Even when threats to internal validity are relatively low, the 
external validity, or the ability to generalize findings across 
subpopulations or from a sample to a population, may be ques- 
tionable. Can we expect that the gains shown by freshmen en- 
tering the institution in 1985 apply to freshmen entering the 
institution in 1990? One's confidence in such generalizations 
would be especially weak for institutions undergoing change in 
their marketing or admissions. External validity will be a par- 
ticular concern when assessment is voluntary rather than re- 
quired, as students who voluntarily participate in a testing 
program would be expected to differ on several dimensions 
from students who choose not to participate. 

Multiple measures 

Threats to internal and external validity may be minimized but 
never eliminated. When outcomes assessments are potentially 
threatening to faculty, students, or staff, these ever-present 
threats to validity can become political ''ammunition'' to dis- 
credit or disregard information about outcomes. 

One method that has been promoted for increasing assess 
ment validity is to employ multiple measures that '"converge" 
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on the outcomes of interest (cf. Cook and Campbell 1979; Pal- 
ola 1981). This approach may be useful for conceptual pur- 
poses, when available instruments do not match institutional 
definitions of key concepts, so that multiple instruments will 
provide a more useful indicator of students' development than 
any single instrument. Empirically, multiple measures provide 
an opportunity to determine the stability of key outcomes when 
assessed with different instruments. And multiple measures pro- 
vide political advantages by providing a "weight of evidence" 
that reduces skepticism. A combination of standardized and lo- 
cally developed instruments may also serve to satisfy external 
demands for accountability while simultaneously involving fac- 
ulty, staff, and students in self-reflection and institutional im- 
provement. Multiple measures do not offer a substitute for 
careful research design and test administration to avoid many of 
these confounds, however. In the face of confounding? factors 
such as history, maturation, and mortality, multiple measures 
will only increase the magnitude of error, uncertainty, and ulti- 
mately embarrassment in assessment. 

Unanticipated Effects of Assessment 
As demonstrated by the experiences of Alverno College, North- 
east Missouri State University, and other schools thai have 
adopted value-added approaches, assessment is an educational 
intervention that modifies the same process it is designed to 
measure objectively. As such, assessment may have unantici- 
pated effects on students, both negative and positive. The man- 
ner in which faculty, counselors, and administrators administer 
aud interpret assessment programs to students will influence the 
reactive effects of testing, which might include: 

1. Test Gfixiety and stress. This particular risk accompanies 
the phenomenon of bottoming out. Sensitivity to this is- 
sue in test administration, dco'iefing, and presentation of 
findings will substantially reduce this problem, 

2 Fatigue, After spending several hours completing pre- or 
posttesting, students' ability to concentrate on other work 
may be limited. Therefo/e, testing should be scheduled 
for the times when it is least likely to i /erfere '*vith ongo- 
ing class work and strdying. 

3. Emphasis on test scores rather than on the process of 
learning, A vigorous assessment program may suggest to 
students thai test scores are more important than the 




process of learning. **We now distribute grades and 
scores as if students were in a contest with each other'' 
(Edgarton 1987, p. 109). It is the responsibility of the in- 
stitution to communicate its underlying values to students 
and to explain why and how the test scores are useful. 

4. Better test-taking skills. For better or for worse, assess- 
ment and evaluation are very much a part of our culture. 
Continued exposure to assessment in a supportive envi- 
ronment may help studen.3 to develop skills to cope ef- 
fectively with tests and evaluations. 

5. A sense of development and growth. Because of the rela- 
tive nature of most grading, students rarely have an op- 
portunity to document or observe their own intellectual 
development. Pre- and posttesting may provide students 
with feedback about their development. Even when indi- 
vidual scores are not released (and the cohort is *he unit 
of measurement), students' experience of the instrument 
during posttesting relative to pretesting as well as the in- 
crease in achievement demonstrated by the group can be 
valuable information. 

6. Curiosity and motivation. Although bottoming out can be 
stressful for students, encountering material with which 
one is unfamiliar can also be stimulating. Challenging and 
engaging assessment instruments may motivate students to 
acquire specialized knowledge and skills or, conversely, 
to broaden their knowledge and skills. When students 
have the opportunity to discuss their testing experience 
with an academic counselor or advisor, they can plan a 
program that responds to their experienced needs and in- 
terests. 
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COGNITIVE OUTCOME INSTRUMENT'S 



This section briefly describes more than 25 cognitive asscss- 
inenl instruments that can be used within a talent development 
perspective. 

Thio survey of standardized, commcrciallj available tests of 
cognitive abilities should not imply that such instruments are 
always the best solution to the challenges of measuring student 
outcomes or development. In fact, a number of researchers and 
practitioners have recently encouraged colleges and universities 
to develop assessment programs that go beyond testing (cf. 
Boyer 1987; Edgarton 1987; Mingle 1986). For example, 'Mhc 
important fact to note is that where an assessment program is 
making a difference, testing is not the hok source of informa- 
tion" (Banta and Fisher 1987, p. 45). In addition, locally de- 
veloped devices as well as standardized instruments may in 
many instances prove highl> useful for assessing outcomes. 
Two institutions that have successfully used locally designed 
instruments arc Kcan College (Kean College of New Jersey 
Presidential Task Force 1980) and Alverno College (Alverno 
College Faculty 1985). 

With these caveats in mind, this section is designed to ac- 
quaint readers with a broad range of standardized instruments 
available for assessing outcomes. The selection of an instru- 
ment foi use within a particular institution, however, requires 
consideration of the institutional context. From this perspective, 
the "best" instrument is one that most closely matches the 
goals and vjilues of the institution and the structure of its eur- 
rieulum (cf. Ewcll 1984). As this review indicates, tests that 
purport to measure the same skill may var>' widely in content 
and structure as a result of the manner in which test makers 
define concepts like comprehension, writing, or reasoning abil- 
ity. Thus, we leave to readers the task of assessing the fit be- 
tween what a specific test measures and what a specific class, 
major, program, or school attempts to teach. 

Congruent with the talent development philosophy, it is rec- 
ommended that student assessments be administered within a 
prctest/posttest research design. Unfortunately, few ecgnitive 
assessment instruments have been u..jd in this manner. An im- 
portant direction for future research, then, is the collection of 
additional empirical information about the suitability of these 
instruments for application in talent development. Until such 
information is available, faculty can best assess the quality of 
alternative instruments for longitudinal, repeated-measures as- 
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scssnicnts, taking into account local institutional and student 
characteristics. 

Another frequently expressed concern about pre- and posttest 
assessments is whether gain scores are valid indicators of stu- 
dents' development (cf. Fincher 1985). Tlie potential problem 
of unreliable gain scores is significantly reduced when agijre- 
gate rather than individual scores are used. Most talent devel- 
opment assessments require group means derived from 
assessments of large numbers of students. Under these condi- 
tions, gain scores should provide reasonably reliable indicators 
of development. Ultimately, however, unreliable gain scores 
are a function of the unreliability of the instrument itself. 

A final concern is the possible confounding effects of prac- 
tice. This issue is of particular concern for tests traditionally 
used in admissions, selection, and certification, because test 
preparation materials are often widely available for such instru- 
ments. While the effects of practice and specialized preparation 
on performance continue to be a topic of debate, potential users 
of standardized instruments should consider the possibility that 
gain scores may be confounded by the effects of test prepara- 
tion. (By asking students to indicate how they prepared for the 
examination and then merging this information with test scores 
and descriptive student data, regression analyses can be con- 
ducted to explore the effects of different methods of test prepa- 
ration.) 

As noted earlier, the talent developirent approach to assess- 
ment does not rely on the use of any particular instrument. 
Rather, the appropriateness of an assessment device should be 
considered in light of the curricular or programmatic aims 
being assessed. To this end, a vast array of cognitive outcome 
instruments are presently available that measure outcomes tl'i 
authors view as critical to higher education's mission of student 
development. These instruments focus on such areas of student 
learning as basic skills, competence in specialized subjects, and 
general education (or comprehensive achievement). 

Tlie instruments described in this chapter are by and large 
nationally normed instruments developed for a college popula- 
tion. This section by no means provides a comprehensive set of 
cognitive assessment instruments. Rather, the instruments se- 
lected represent a wide range of standardized tests that are 
available lor use with postsecondary students. Appendix A pro- 
vides an overview of the instruments included in this section. 

The following descriptions reflect wherever possible the first- 
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hand experiences of ihc aulhors. In many cases, however, sec- 
ondary sources have been used. Thus, ihis section is inlcndcd 
as a point of departure rather than the last stop for selecting 
cognitive assessment instruments. Sc^^eral reference books arc 
especially recommended for additional information about stan- 
dardized assessment instruments. Sweetland and Kcyser's 
Tests: A Comprehensive Reference for Assessments in Psychol- 
ogy, Education, and Business (1986) provides brief, easy-to- 
read descriptions of a large number of cognitive assessment in- 
struments. Tlic ETS Test Collection Catalog, volume 1, 
Achievement Tests and Measurement Devices (ETS 1986) also 
describes cognitive assessment instruments available from a va- 
riety of .sources. For critical reviews as well as descriptive in- 
formation, an excellent encyclopedia is the Mental 
Measurements Yearbook (Mitchell 1985). Test Critiques (Key- 
ser and Sweetbnd 1987) is less comprehensive than the MMY 
but provides thoughtful reviews. 

This review is Organized into three broad categories: general 
education tests, specific skills tests, and subject matter compe- 
tency tests. General education tests include instruments that 
provide an integrated approach to measuring an array of cogni- 
tive abilities typically associated with core curricula or general 
education programs. In contrast, specific skills tests focus on a 
single ability, such as reading, writing, mathematical reason- 
ing, or cognitive reasoning. Subject matter competency tests 
measure knowledge and skills associated with specific disci- 
plines. (A fourth category of interest to the authors but not ex- 
plored here involves assessments of vocational/practice skills 
for specific occupations.) 

Within each category, instruments are further divided accord- 
ing to the target population for which they were designed: 
lower-division students, upper-division students, or (occasion- 
ally) a full range of college students. Taking the test level of 
difficulty into account reduces (but does not eliminate) the 
problems of bottoming out and topping out. 

Typically, tests designed for lower-division students are ad- 
ministered as pretests for selection, placement, diagnosis, or 
curricular development. Useful information may be obtained by 
readministcring these instruments as posttests, generally at the 
end of a student^s second year of college (following the com- 
pletion of the general education program, core curriculum, or 
lower-division requirements). Sueh posttests provide an oppor- 
tunity Jo measure the change in students' performance over 
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time on cither an aggregate or individual level, as appropriate. 
In addition, posttests may contribute to a better understanding 
of the impact of the lower-division curriculum, the effective- 
ness of decisions about placement, and the degree to which the 
institution is achieving its educational goals. 

On the other hand, tests designed for upper-division students 
are often used for admission to graduate or professional schools 
or for certification. Additional benefits may be obtained by pre- 
testing with these instrunients at an earlier time— for example, 
wh'^n students enter their selected majors or begin upper- 
division study. Pretesting may be useful for diagnosing stu- 
dents' strengths and weaknesses and for indicating the skills 
and abilities that students must master to reach the standards 
established by graduate and professional programs. Pretesting 
also provides an opportunity to collect baseline information 
against which students' performance on later posttests can be 
assessed. The gains shown by students will be useful for com- 
paring the educational impact of particular programs of study 
and/or developmental patterns shown by different subgroups of 
students (assuming large enough sample sizes to provide stable 
comparisons). 

In other words, instruments designed for other purposes may 
potentially support a talent development approach lo assessment 
when administered as part of a longitudinal, repeated-measures 
design. As discussed in more detail later, because many institu- 
tions often use standardized instruments for selection, place- 
ment, or certification, relatively minor adjustments would be 
needed to rcadminister the instruments and thereby obtain infor- 
mation about students' talent development. 

General Education Tests 

This section describes assessment instruments that measure a 
range of cognitive concerns and subject areas at a level appro- 
priate for undergraduate college students. 

Instruments geared toward lower-division students (or below) 
The ACT Assessment Program and College Board Scholas- 
tic Aptitude Test. These two instruments may be useful for 
more than their traditional applications in admissions. The ACT 
Academic Tests provide scores in four areas: English usage, 
mathematics usage, social studies reading, and natural sciences 
reading. These scores, as well as the composite, are presented 
on standard scales ranging from 1 to 36. In add' .on to its tradi- 



tional use in admissions, ACT encourages application of results 
in academic counseling, guidance, placement, and orientation 
(Aiken 1985; Kifer 1985). In support of these aims, the ACT 
High School Report provides raw scores and percentile rank as 
well as the standard scale scores (Kifer 1985). Beyond ihese 
applications, the Academic Tests are useful for talent develop- 
ment purposes on a pre- and posttest basis to assess the com- 
prehensive learning of students during the first two years of 
college. 

The Scholastic Aptitud" Test (SAT) also provides up to four 
scores. In addition to the verbal and mathematical scores, ver- 
bal subscores for reading and vocabulary are available, as well 
as results for the Test of Standard Written English. Although 
the SAT emphasizes mathematical skills somewhat more than 
the ACT (cf. Aiken 1985), evidence suggests that the ACT and 
SAT provide very similar informatioi . The mathematics scales 
of the two tests are highly correlated, and it is possible ,o ob- 
tain an excellciiJ estimate of the SAT verbal score from a com- 
bination of the English, social studies, and natural sciences 
scales (Astin, Henson, and Christian 1978). The use of the 
ACT Academic Tests and the SAT on a pre- and post-test basis 
represents a notable expansion of the value of these instruments 
beyond their traditional conception as tests of admission. 

GencTal Examinations of the College-Level Examination 
Program. The General Examinations of the College-Level Ex- 
amination Program (CLEP) include material that is usually cov- 
ered in the first two years of college. The General Examinations 
address five broad areas— English composition, humanities, math- 
ematics, natural sciences, and social sciences and history— em- 
phasizing "concepts, principles, relationships, and applications of 
course materials'' (Sweetland and Keyser 1986, p. 374). CLEP is 
designed for both traditional and nontraditional students to earn 
college credit for skills and knowledge they may have acquired 
outside an academic setting. 

Reviewers difi^r in their evaluation" of the General Examina- 
tions. Some are generally positiv^e, noting that the CLEP Gen- 
eral Examinations "represent a reasonable balance between 
factual recall and application'' (Dressel 1978, p. 634). Others 
are less sanguine, arguing that the exams do not adequately 
measure critical thinking and interpretation and instead empha- 
size factual recall anc^ <?imple problem solving (Wallace 1978). 
And although Dressel praises the technical quality of the ex- 
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ams, Wallace warns that ''students just completing the courses 
for which the CLEP tests are designed to measure equivalence 
generally answer fewer than half the items correctly. This 
could be deleterious to the score characteristics of the 
tests . . (p. 639). 

Sequential Tests of Educational Progress, series III. The Se- 
quential Tests of Educational Progress (STEP) are similar to the 
CLEP in that both are designed to assess academic mastery and 
to assist in diagnosis. The most recent set (series III) includes 
seven self-contained tests designed for grade levels 10 to 12.9. 
(Series II, which is still available, includes some instruments 
for students in grades 13 and 14.) The tests measure various 
components of achievement in general education, including 
English expression, reading, mathematics, science, and social 
studies. The examinations are designed to emphasize applica- 
tion of knowledge over recall of facts. 

Some concern has been expressed about both the validity and 
reliability of the STEP (Floden 1985), specifically about con- 
tent validity of the separate instruments. Floden suggests they 
may be better viewed as Treasures of "general ability'' rather 
than of specific skills and abilities. Another researcher recom- 
mends against the use of STEP for individual diagnosis and 
placement until future research establishes its validity for such 
purposes (Shanahan 1985). 

Stanford Test of Academic Skills (1982 edition). The 
Stanford Test of Academic Skills (TASK) provides a compre- 
hensive assessment of basic skills considered necessary to un- 
dertake college-level work, including reading comprehension, 
vocabulary, English, mathematics, science, social science, and 
use of information. The instrument was developed to reflect the 
instructional objectives of secondary schools, based on a review 
of textbooks, curricula, and state guides (Ory 1985). The au- 
thors' interest is in the version of TASK designed for grades 9 
to 13, although another version for grades 8 to 12 is available 
as well. An optional writing assessment program accompanies 
the TASK; it is designed to measure syntax, organization, vo- 
cabulary, the quality of ideas, and general merit. 

The usefulness of this instrument for talent development as- 
sessment is enhanced by the provision of raw (absolute) scores 
for subscales as well as for ''content clusters" within sub- 
scales. Nonetheless, "one would not be interested in using 
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TASK if the instructional objectives used in its constructio;i did 
not match the objectives of the user's school'' (Ory 1985, p. 
1469). 

Instruments geared toward upper-division students 
Graduate Record Exam. The General Test of the GRE, 
widely used as an admissions tool for graduate programs, offers 
opportunities for talent development assessments similar to 
those provided by the SAT. The GRE provides scores for ver- 
bal, quantitative, and analytical reasoning abilities. Use of this 
measure for pre- and posttesting would be most appropriate 
with upper-division high achievers to avoid the negative effects 
of bottoming out. Future analyses that offer methods for com- 
paring performance on the SAT and GRE would significantly 
extend the potential applications of these measures. Further, the 
relativistic nature of both GRE and SAT scores reduces the 
usefulness of the instruments for talent development applica- 
tions. 



ETS Academic Profile. The Academic Profile is an innovative 
new instrument, now in its pilot year. It was designed for stu- 
dents who have completed their general education programs 
"to measure academic skills (college-level reading, college- 
level writing, critical thinking, and using mathematical data) in 
the context of three major discipline groups (humanities, social 
sciences, and natural sciences)" (Educational Testing Service 
1987, p. 1). Two versions of the Academic Profile are avail- 
able: a three-hour version, which includes 144 items and for 
which ETS provides both group and individual scores, and a 
one-hour short form, which includes 48 items and for which 
ETS provides only group scores. In addition, ETS offers an op- 
tional essay that is scored by the institution, using ETS man- 
uals. 

As part of the pilot year, ETS is conducting validation stud- 
ies and actively soliciting feedback from participating institu- 
tions about the usefulness of the instrument. Because it is 
specifically designed for outcomes assessments and apparently 
avoids some of the logistical problems posed by the McBer Be- 
havioral Event Interview and the ACT COMP Comprehensive 
Exam, the Academic Profile may come to fill a needed gap in 
the assessment of general education. 
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Graduate Management Admissions Test, Tlic GMAT is the 
first of several instruments reviewed here that were developed 
as admission tests for professional school. Traditionally associ- 
ated with admissions, these instruments could be used in longi- 
tudinal designs to assess the effectiveness of academic 
programs in preparing students for admission to professional 
schools. Pretesting at the beginning junior level (following 
completion of lower-division requirements) could be imple- 
mented by using older versions of these tests, which are pub- 
lished in the many preparation manuals now available 
commercially. 

Tlie GMAT is designed 'i predict success in the first year of 
graduate study in business. In effect, it is a '*niultiple-choice 
paper-pencil test measuring general verbal and quantitative abil- 
ities. It does not measure proficiency in undergraduate business 
or economics courses" (Sweetland and Keyser 1986, p. 399). 
Use of the GMAT on a pre- and posttest basis may be helpful 
for those institutions especially concerned with the educational 
preparation of students for admission to MBA study. Moreover, 
apart from the issues of construct and predictive validity that 
have been raised generally with the GMAT as an admissions 
instrument (Crosby 1985), the Practical Business Judgment sec- 
tion and the items that require student3 to interpret charts, 
graphs, and tables have useful face validity for talent develop- 
ment. 

Medical College Admission Test, Relative to the GMAT, the 
MCAT has more utility for general education pre- and post- 
tests, as it assesses knowledge of science (emphasizing biology, 
chemistry, and physics), application of science knowledge 
through problems in science, and varied analytical skills in 
reading and quantitative areas. Given the prerequisite under- 
graduate coursework expected in biology', physics, and general 
and organic chemistry, this test lends itself well to pre- and 
posttests. 

Law School Admission Test, The LSAT may also be appro- 
priate for talent development assessments during the undergrad- 
uate years. The post-1982 versions of this instrument include 
four subtests (reading comprehension, analytic reasoning, logi- 
cal reasoning, and "issues and facts*'). Each subtest appears to 
measure verbal reasoning skills, however (Melton 1985). 
For purposes of talent development, a major drawback is that 
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the Law School Admission Council reports only the total score. 
Consequently, student gains and losses on individual subtests of 
the LSAT would not normally be available. A talent develop- 
ment approach might be better served by the Graduate Record 
Examination General Test, with its separate reporting of the 
student's verbal, quantitative, and analytical scores. Use of the 
LSAT on a pre- and posttest basis appears to make most sense 
for those institutions interested in students' preparation for law 
school. 



NTE Core Battery. The NTE Core Battery (formerly known 
as the Common Examinations of the National Teacher Exami- 
nation Program) measures the academic proficiency of under- 
graduate students and recent graduates of teacher preparation 
programs (Scannell 1985b). The Core includes separate two- 
hour tests in communication skills, general knowledge, and 
professional knowledge. 

The communication skills test includes subtests for listening, 
reading, and writing. The listening section presents material 
and questions via a recording and requires the examinee "to 
identify the content of a message or a paraphrase of the con- 
tent, to identify a main idea, to evaluate, and to infer from oral 
signals" (Scannell 1985b, p. 1067). The reading section ad- 
diesses similar analytic concerns, using passages topically re- 
lated to education. The writing assessment includes multiple 
choice items about grammar, punctuation, and effectiveness of 
expression as well as an essay component, in which students 
are asked to relate a personal experience. Generally, only a to- 
tal scale for communication skills is reported 

The test of general knowledge aduresses literature and the 
fine arts, mathematics, and a variety of science and social sci- 
ence areas. While the social studies and literature and fine arts 
sections require students to demonstrate skills in ir.erpretation 
and application of knowledge, items in the mathematics and 
science areas unfortunately appear to be geared more to the 
secondary than the postsecondary level of difficulty (Scannell 
1985b). Again, only the total score for general knowledge is 
provided. 

The test of professional knowledge addresses pedagogical is- 
sues related to teaching practices, theoiy, and evaluation. For 
an institution engaged in teacher preparation, assessment with 
this subtest appears to provide a comprehensive measure of the 
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professional knowledge generally included in teacher education 
programs. 

Based on the high intercorrelations among the three tests of 
the Core Battery, these scales may "measure similar knowl- 
edge and skills and . . . may not reflect distinct domains of 
proficiency" (Nelsen 1985, p. 1066). Nonetheless, the NTE 
tests may well be preferable to the alternatives for assessing the 
academic skills and abilities of beginnmg teachers (Nelsen 
1985). Although generally administered to college seniors who 
are preparing to enter the teaching profession, the NTE Core 
Battery seems well suited for e?rlier pretest administration as 
part of a longitudinal research design. In fact, NTE encourages 
the use of the Core Battery for both ''standardized examination 
cf academic achievement for college students entering or in 
teacher education programs and for college seniors completing 
such programs" (quoted in Quellmalz 1985, p. 1188). 



NTE Pre-Professional Skills Test. Basic proficiencies needed 
for a teaching career are aiso the focus Oi the Pre-Professional 
Skills Tests (PPSTs) of the National Teacher Examination Pro- 
gram. The PPST includes separate instruments for reading, 
mathematics, and writing. Both the reading and mathematics 
tests are multiple choice, while the writin'^ test includes both 
multiple choice and an essay. 

The PPST is administered both to students interested in en- 
tering teacher training programs (as an admissions tool) and 
also to students who are completing such programs (as part of 
the certification process) (Bauernfeind 1987). While these ap- 
plications (as well as the ability to derive raw scores from the 
scaled scores provided) suggest that the PPST may be appropri- 
ate for talent development assessments, readers should be 
aware of several concerns reviewers have expressed. 

The PPST, for example, has been criticized for its emphasis 
on minimum standards; institutions might consider forgoing 
tests of "minimum competency" in favor of alternative instru- 
ments that focus on "college-level mastery" (Quellmalz 1985). 
Further, the NTE is unclear about the target audience(s) for the 
test, and high correlations between the reading and math sub- 
tests suggest that one or both of these tests may confound mea- 
surement of the basic skills (Quellmalz 1985). And the content 
validity of the PPST essentially reflects subjective judgments 
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about the knowledge that is important for futuie teachers (or 
others) to master (Bauernfeind 1987). 

Instruments geared toward all levels of college students 
ACT College Outcomes Measures Project. The ACT COMP 
represents an innovative approach to measuring outcomes. This 
test battery was designed to measure "general'' outcomes of 
college or students' abilities "to apply specific facts and con- 
cepts in work, family, and community roles" (Forrest and 
Steele 1982, p. 1). That is, the tests attempt to go beyond spe- 
cific course content to measure the more general abilities and 
competencies that are often identified as the goals of general 
and liberal education. The COMP is designed to measure stu- 
dents' competence in three content areas (functioning within so- 
cial institutions, using science and technology, and using the 
arts) and three process areas (communicating, solving prob- 
lems, and clarifying values). ACT offers two forms of the 
COMP, a six-hour composite form and a shorter (under three 
hours) objective form. 

In both forms, students are required to respond to a variety 
of stimuli, including text, audio tapes, and films. In the com- 
posite form, response modes also vary, including multiple 
choice, short answers, essays, and tape-recorded speeches. The 
objective form includes only multiple-choice responses. ACT 
has conducted (and will disseminate upon request) several stud- 
ies to establish the validity and reliability of the instruments for 
different populations. 

COMP scores are provided in a detailed profile, with scores 
for each subtest presented as a percentile relative to the norma- 
tive group. This relative scaling system somewhat limits the 
test's usefulness for talent development assessment. 

ACT has attempted to fill an important gap in the range of 
assessment options with the COMP, but several characteristics 
of the composite test limit its usefulness. First, administration 
of the composite form of the COMP is time consuming and 
complex, requiring numerous audiovisual device % iwo separate 
sittings, and six hours of testing time. Students' fatigue, equip- 
ment failures, anil other logistical problems can severely reduce 
the validity of COMP scores. Tiie talent development approach 
compounds these problems, as both pre- and posttesting is 
needed. After once experiencing the difficulty and length of the 
pretest, students may be reluctant to parMcipate in a posttest 
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(Astin and Ayala 1987). ACT recognizes the importance of stu- 
dent recruitment and offers some assistance in this area. 

The objective form is shorter and simpier and is the form 
that higher education institutions have used most often. Whue 
logistical problems and costs are reduced, the benefits of multi- 
ple options for response are lost. Further, the objective form 
yields scores that are accurate at the aggregate level only, 
whereas the comprehensive form yields scores that may be ap- 
propriately used for measurement for individuals. 

McBer Behavioral Event Interview. The Behavioral Event 
Interview (BEI), an integral part of the Student Potential Pro- 
gram of the Council for Adult and Experiential Learning 
(CAEL), is designed to identify a broad range of students' tal- 
ents and potential that relate to success in educatioi^ The BEI 
involves a one-hour, in-depth probing strategy that elicits infor- 
mation of a critical incident nature. The trained interviewer 
evaluates the data, coding the behavioral insights from the in- 
terview as evidence of specific capabilities, which range from 
initiative, persistence, and planning skill to self-confidence, in- 
fluence, and leadership talents. 

An evaluation of this assessment procedure suggests that 
"the BEI has a significant degree of construct validity" (Astin, 
Inouye, and Korn 1986, p. 32). The instrument can effectively 
be applied to predict student outcomes, including grades and 
academic progress. Despite these positive indicators, further 
evaluation of the BEI is recommended with larger sample sizes 
than have previously been obtained. Further, the BEI should be 
used in longitudinal assessments to explore the institutional and 
educational factors that facilitate growtn on this measure (As- 
tin, Inouye, and Korn 1985). Administration of the BEI, how- 
ever, requires that an institution provide time and money to 
obtain the special training for intervieweis* needed for this time- 
consuming process. 

Specific Skills Tests 

Tests reviewed in this section focus on a single skill, often con- 
sidered of critical importance in undergraduate education, in- 
cluding writing, reading, mathematical reasoning, verbal 
reasoning, ap''^ critical thinking. 

Instruments geared toward lower-division students 
College Board English Composition Test with essay. This 
test has two parts: an essay question and a set of 60 multiple- 





choice items pertaining to such concerns as idiomatic expres- 
sion, usage, grammar, and dictioh. Instructions for the rehi- 
tively brief (20-niinute) essay direct examinees "to plan and 
write an essay, agreeing or disagreeing with a statement pro- 
vided and supporting their opinion with specific examples from 
personal experience or knowledge" (Scannell 1985a, pp. 357- 
'58). The exam is scored by the testing agency, with each essay 
read independently by two trained readers. This scoring service 
is not inexpensive, however, which explains the test's present 
schedule of administration— once a year (u;>ually in December) 
at test centers established by the College Board. 

Two components of the CL EP General Examination, re- 
viewed in the previous section, offer alternative approaches to 
. the assessment of writing skills. The CLEP General Examina- 
tion in English Composition, edition two, measures college- 
Ip.vel competency in a similar two-part approach, vith an essay 
section and a 65-question objective section, the laiter dealing 
primarily with logic and sentence structure (Sweetland and 
Keyser 1986). Whereas the College Board English Composition 
test allocates only 20 minutes for the essay, this test has a 45- 
minute essay in which students are asked to present logical ar- 
guments and evidence to support a particular point of view. 

In addition, the CLEP Humanities (Freshman Engjish) test 
offers an optional essay section to accompany a set of objective 
questions (Sweetland and Keyser 1986). The Freshman English 
test allocates 90 minutes for the optional essay, significantly 
longer than the CLEP and the College Board composition tests. 
During this time, the student is called upon to deal with three 
writing tasks in which "the topics present concrete problems 
involving personal knowledge and require control and flexibil- 
ity in the use of language" (p. 382). 

Potential users of any of these three composition tests should 
recognize that they are instruments assessing "standard" or 
"textbook" English competency on the part of entering or first- 
year college students, an especially important consideration for 
those institutions with populations wh^^se English skills are not 
standard. Further, a low level of reliability is a frequent prob- 
lem with any essay test (Bauernfeind 1987). A third considera- 
tion involves resources— personnel and financial. Tlie 90- 
minute optional essay section of the Freshman English subject 
examination is designed to be graded by personnel from the ex- 
aminee's institution, calling for the commitment of significant 
staff and faculty time. Where the essay is a required part of the 
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instrument, however (as in both the CLEP and College Board 
examinations), the testing organization provides a grading ser- 
vice. 

Nelson-Denny Reading Test, forms E and The Nelson- 
Df^nny Reading Test focuses on the developnient of skills in 
tii.ee major areas of reading ability: vocabulary development, 
reading comprehension, and reading rate. Each form of the test 
includes two subtests, vocabulary and comprehension, both 
using a multiple-choice format. With the 100 items in the vo- 
cabulary subtest, students are to choose from five options the 
one that best completes a sentence or defines a word. Simi- 
larly, on the comprehension ^subtest, the examinee responds to 
36 multiple-choice questions related to eight passages covering 
such areas as the humanities, science, and social science. It is 
also from reading the first of these passages that an individual's 
rate is determined— the number of words read by one minute 
into the passage. 

Although the Nelson-Denny is widely used, reviewers raise 
several important questions about its usefulness for college stu- 
dents. While the instrument is normed for secondary and col- 
lege students, "the test does not discriminate well among good 
readers" (Hambleton 1987, p. 476). Tlie Nelson-Denny sample 
underrepresents blacks and Latinos and i.udents from regions 
with significant enrollment in "major" institutions (Ysseldyke 
1985). Further, the reading passages sampled "do not appear 
representative of the text t>pes students will regularly confront 
in science, mathematics, vocational education, and other 
courses" (Tiemey 1985, p. 1036), and it is questionable 
whether the test has "precision and generalizability" to support 
its use for diagnostic and placement decisions (Tierney 1985). 
To ameliorate these problems, institutions should review the fit 
between test content and curriculum content before using the 
test (Van Meter and Herrmann 1986-87). 

Writing Proficiency Program. Its publisher describes the 
Writing Proficiency Program as a "criterion-referenced assess- 
ment and instructional system." The assessment instruments 
are only one component of a comprehensive writing program 
geared toward students in grades 11 through 13. Because the 
program includes both an initial test (pretest) and a mastery test 
(posttest), the assessment component may be useful for talent 
development applications. Each instrument includes both 




multiple-choice and essay questions. The course instructor 
scores the exams, which yield subscores for a variety of techni- 
cal and expressive aspects of writing. Given the absence of em- 
pirical information about test reliability and validity, however, 
this package is likely to be most useful as a resource for insti- 
tiUions interested in developing their own writing assessments 
(cf. Polloway 1985). 

Insirumenis geared toward upper-division students 
Western Michigan English Qualifying Exaniinalion. The 
Western Michigan English Qualifying Exam (EQE) is used to 
gauge students' levels of English usage skills. This 195-item 
assessment tool addresses "grammatical errors (30 items), 
punctuation for meaning (45 items), sentence structure (30 
items), spelling (30 items), word usage (30 items), and reading 
comprehension and rhetorical style (30 items)" (Sweelland and 
Keyser 1986, p. 247). Designed for measuring the English 
skills of college juniors through entering graduate students, the 
EQE uses items taken from the written work of students at this 
level. 

The EQE seems appropriate for use on a longitudinal basis 
with upper-division students. Pretesting with this instrument 
during the junior year is potentiilly useful for diagnostic pur- 
poses, while subsequent retesting during the senior year would 
provide a posttest measure of change. 

DoppeK Niadiematicnt Reasoning Test. Developed as "a 
high-level measure of mathematical skills comparable to the 
Miller Analogies Test'' (Sweetland and Keyser 1983, p. 254), 
the Doppeli is widely used as a measure of mathematical rea- 
soning ability in the selection of students for graduate work. 
Given this design, administering the instrument as a pretest to 
juniors followed by exit-year posttesting would be one poten- 
tially useful application of the talent development approach. 

Pre- and posttesting with this instrument could provide col- 
leges and universities a measure of their curricular impact and 
of individual student's growth in this skill area. Ir;Stitutions 
with interest in this 50-probleni multiple-choice test should be 
aware, however, that "apparently no systematic approach has 
been made to determine how valuable the test is for graduate 
students in other areas" besides mathematics and statistics 
(Clemens 1965, p. 725). 

Furthermore, potential talent development users of the Dop- 
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pelt should note that **none of the problems involve mathemat- 
ics beyond the usual secondary school level" (Clemens 1965, 
p. 725), an observation that reflects generally on the mathemat- 
ical literacy of cc. jge students. How this fact matches institu- 
tional expectations for students' basic skill development in this 
domain is thus an essential consideration for users. Further re- 
search seems called for to determine the utility of ihis instru- 
ment for talent development assessment with diffcrei;! college 
populations. 

Miller Analogies Test. The Miller Analogies Test (MAT) is 
well known is an instrument designed to aid admission to grad- 
uate school by measuring verbal reasoning skills. The test in- 
cludes 100 items, each of which requires the student to select, 
from multiple options, the best completion to an analog). The 
authors' major concern is not with issues of predictive validity 
that preoccupy so many others. Rather, the talent development 
approach leads to an interest in better defining the cognitive 
skills measured by the instrument. In this regard, the MAT is 
a difficult test that "measures largely verbal comprehension 
in the context of general information" (Wiliingham 1965, 
p. 749). 

For institutions wishing to assess such verbal ability, this in- 
strument has multiple pre- and posttest benefits. The MAT pro- 
vides information about verbal skills and is widely accepted as 
a tool for graduate admissions. Consequently, institutions using 
the MAT might obtain longitudinal data that could be periodi- 
cally compared with standards set for entering graduate students 
in a variety of disciplines. A special strength of the MAT is its 
"high ceiling," or its ability to differentiate among students 
with high levels of verbal ability (Geisinger 1987). Further, the 
MAT yields absolute (not relativistic) scores, and item analyses 
can be obtained by administering widely available "practice 
tests." A potential weakness of the MAT, however, is that in- 
dividual scores can be significantly improved through test train- 
ing (Geisinger 1987). When used in longitudinal assessments, 
such training effects may confound the measurement of actual 
increases in verbal abilities. 

Instruments geared toward all levels of college students 
Wa(son-Glaser Critical Thinking Appraisal, forms A 
and B. The Watson-Glaser Critical Thinking Appraisal is de- 
signee to measure adults' ability in an area that is frequently 



identified as an important goal of higher education Ovof.hlke 
1987). This ability is increasingly important as both an educa- 
tional goal and a focus of evaluation in selecting employees 
(Helmstadter 1985). The WatscvGlaser includes five subtests: 
inference, recognition of assumptions, deduction, interpretation, 
and evaluation of arguments. 

In this 40-minute test, examinees contend with 80 items (16 
per subtest) that require them to recognize both valid arguments 
and inconsistencies in reasoning and to demonstrate their level 
of skill in making inferences and noting implications from 
statements. The Watson -Glaser is geared to a ninth grade read- 
ing level, even though it clearly calls for reasoning skills that 
are above that level, but in this way it largely avoids contami- 
nating the assessment of critical thinking abilities with reading 
abilities. In content, the items include both neutral and more 
controversial topics, focusing on problem' and issues of data 
interpretation likely to be encountered through contemporan^ 
media. 

Although reviewers generally regard the Watson-Glascr as a 
well-constructed test, some cautions are necessary (Bergcr 
1985). First, the test does not clearly distinguish between items 
designed to be neuiral versus those designed to be more contro- 
versial. Second, this instrument assesses critical thinking only 
through reading; one can but speculate as to the comparability 
of findings were students' critical thinking abilities assessed 
with a listening test. 

Yet another caution relates to using Watson-Glaser assess- 
ment data for advising individual students. The subtest scores 
are based on a small number of items, which the authors lecog- 
nize as constituting insufficient reliability for individual evalua- 
tion or diagnosis. But for talent development purposes, these 
subtest scores have utility for the analysis of the critical think- 
ing abilities at an aggregate level, which in turn could be re- 
lated to the types of critical thinking training that might be 
most needed by such groups. 

Finally, the utility of the test largely depends on one's level 
of agreement with the operational definition of critical thinking 
embodied by the Watson-Glaser (Berger 1985; Helmstadter 



Cornell Critical Thinking Test, level Zr Those with interest 
in the assessment of critical thinking have an alternative instru- 
ment available with the Cornell Critical Thinking Test, level Z, 
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designed for grades 13 and over. Like the Watson-G laser, the 
Cornell Test is designed to assess talents for critical thinking of 
college students and other older adults. Through 52 items, di- 
vided into seven sections, this test seeks to assess such abilities 
as detecting equivocal argument, evaluating the reliability of 
observations, judging the autherMcity of sources, and discover- 
ing various types of assumptions. 

The American Psychological Association has rated the tech- 
nical construction of Watson-Glaser above the Cornell test 
fWoehlke 1987). The adequacy of either test for a talent devel- 
opment approach, h'^wever, should be judged by the particulars 
of an institution's curriculum in this area. The correct choice 
requires close examination of the fit between the goals of the 
curriculum and the skills measured by the assessment instru- 
ments. 

Goyer Oi^auizution of Ideas Test, form S. The Coyer Or- 
ganization of Ideas Test (GOIT) is a 30-minute, multiple-choice 
test focusing on various aspects of one*s ability to organize 
ideas verbally. GOIT test takers are faced with questions about 
outlining, with items on the ordering of statements, and with 
items that '■equire selection of the most appropriate word, 
phrase, premise, or unifying statement. 

"Although the test measures something consistently, it is un- 
clear if that something is a generalized organizational skill or 
the content of an introductory speech communication class" 
(Brown 1985, p. 618). The GOIT may be particularly useful, 
however, "for measuring the effect of efforts to upgrade the 
skiils embodied in its terms" (Frary 1985, p. 619,. What we 
migln have, then, is a relevant assessment instrument for 
courses where organizational skills are ta^'ght. 

The GOIT requires a relatively high level of reading ability 
(Frary 1985), and the test may confound the organizational 
constructs it is designed to measure with "general verbal abil- 
ity" (p. 619). A final concern is that the normative data pro- 
vided are inadequate, drawn from a sample that is not 
representative of either a college student or general adult popu- 
lation. 

Subject Matter Competency Tests 

This section considers some of the instruments available to as- 
sess students' mastery of diverse subject matter. The extensive 
range of curricular specialties prohibits a comprehensive re- 




view, and the presentation consequently emphasizes examina- 
tion programs with a variety of subject matter tests. The 
institutional utility of these tests in a talent iJevelopment ap- 
proach, however, depends on their relevance to the actual cur- 
riculum content taught. Faculty must review each assessment 
instrument to determine how well it addresses the various sub- 
ject matter competencies expected of students by their particu- 
lar disciplines. 

Instruments geared toward lower-division students 
Advanced Placement Program of the College Entrance Ex- 
amination Board. This Educational Testing Service-College 
Board production is "designed to assess achievement, place en- 
tering college students, and assist in granting credit to students 
who have done college-ievel work in secondary school" (ETS 
1980, p. 12). Although designed for administration in high 
schools, those students who elect to take these exams demon- 
strate relatively high levels of cognitive skill and preparation 
for postsecondary education. 

The Advanced Placement Program represents a major effort, 
with examinations that relate to 24 introductory college courses 
in 13 fields: biology, chemistry, physics, French, German, 
Latin, Spanish, mathematics, music, computer science, Eng- 
lish, history, and art. Wuh no test longer than three hours, 
these paper-and-pencil instruments (of a largely multiple- 
choice, objective nature) have potential to efficiently gauge stu- 
dents' learning, especially for lower-division students. 

Unfortunately, the grading procedures employed by ETS sig- 
nificantly constrain the use of the Advanced Placement Pro- 
gram for talent development aims. With examinations graded 
on a five-point scale, an institution obtains only a cmde esti- 
mate of students' learning. Information that relates to specific 
items or subfields of knowledge and to the particular cognitive 
or curricular objectives of a college academic program is not 
provided. Another barrier to the use of the Advanced Place- 
ment exams for talent developmen* purposes is their cost: ETS 
currently charges $53 per person per test. Under these condi- 
tions, the contributions of the program are probably limited 
simply to serving as a placement tool for the institution and as 
a way to earn credit for the degree seeker. 

College Board Achievement Tests. The Educational Testing 
Service also oticrs a variety of subject matter tests comparable 
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to those of the Advanced Placement Program. The primary use 
of these Achievement Tests, however, seems more for admis- 
sions and for the prediction of students' performance in college 
courses. Still, as they are designed for use with students in 
grades 11 through 13, colleges and universities could readmin- 
ister these 60-minute. paper-and-pencil tests as measures of tal- 
ent devxilopment. 

Instruments geared toward upper-division students 
GRE subject tests. Usually viewed as a graduate admissions 
assessment program, the GRE test results of graduating seniors 
in these subject matter exams might have additional usefulness 
for talent development. Earlier pretest administration of the 
GRE would provide an institution with baseline data against 
which to assess cognitive outcomes on exit; however, the f'i!i 
benefit of this pre- and posttest approach is again limited to the 
aggregate level, given the GRE's scaled scores. Despite the ab- 
sence of sufficient subtest or item information, the talent devel- 
opment use or the GRE should enable academic departments 
and programs to more adequately assess both their effect on the 
learning of student cohorts and the appropriateness of their cur- 
ricular preparation, as measured by this nationally normed in- 
strument. 

NTE Specialty Area Tests, The NTE Specialty Area Tests 
provide opportunities to assess competency in 26 fields of 
study, including such teaching-related areas as an and music 
education, early childhood education, and the teaching of read- 
ing, social studies, and speech communication. The Specialty 
Area Tests are intended to measure the knov ledge and abilities 
of students who have majored in the area(s) assessed by the 
tests v'Scannell 1985b). If students were pretested with the rele- 
vant instrument upon selecti( .i of a major and then posttested 
upon formal completion of requirements in the major, a helpful 
talent development assessment would be available in that se- 
lected subject. Thus, the concern is not with issues of predic- 
tive validity that seem to surround these instruments but with 
what the data add to our understanding of students' achieve- 
ment and instructional impact. 

Instruments geared toward all levels of college students 
ACT Proficiency Examination Program* The College Profi- 
ciency Exams address a wide range of competence in siilject 
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matter. Developed as part of the New York State Regents Ex- 
ternal Degree Program, the ACT Proficiency Examination Pro- 
gram (PEP) offers 49 tests: 31 objective, seven essay, and 11 
that combine objective and essay components. Eighteen of the 
examinations are in the accounting, marketing, finance, and di- 
verse management areas; still others h\ 'e relevance to such 
professional fields as nursing, education, and criminal justice; 
aad others are concerned with history (American or Afro-Amer- 
iran), Shakespeare, earth science, physical geology, or anat- 
omy and physiology (Mitchell 1985). 

Primarily aimed at assessing proficiency in subject matter at- 
tained outside the usual classroom setting, this ACT program 
provides a basis for awarding college credit and placing return- 
ing students in appropriate classes. The instruments in the pro- 
gram seem especially appropriate to the talent development 
needs of institutions with nontraditional students who bring sig- 
nificant postsecondary learning experiences to their college ca- 
reer. These tests should be reviewed, however, to establish 
their utility for institutions with more traditional college popula- 
tions. Pretest and posttest administration of these tests may be 
especially useful for considering the contribution of various ac- 
ademic programs to the types of knowledge desired by increas- 
ingly career-minded undergraduates. 

CLEF Subject Examinations. Like the ACT Proficiency Ex- 
amination Program, CLEP Subject Examinations serve as a ve- 
hicle for awarding college credit for knowledge acquired 
outside the usual clas: "oom. This broad effort by ETS provides 
46 subject matter tests in such categories as business, composi- 
tion and literature, foreign languages, history and social sci- 
ences, and the sciences and mathematics. Given the range of 
influences (formal and informal) to which a learner is exposed, 
these CLEP-type examinations appear to offer instruments 
suited to assessment of competence in subject matter of a stu- 
dent body with quite diverse learning experiences. 

As with other ETS-College Board instruments, full talent de- 
velopment benefits are limited by the lack of specific informa- 
tion on responses. When an optional essay section of a subject 
exam is administered, however, the institution has the responsi- 
bility of grading the essay, whiclj for talent development advo- 
cates presents an opportunity to assess students' learning on a 
pre- and posttest basis. While this requirement means a signifi- 
cant commitment of staff and faculty input and time, institu- 
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tions are able to assess what is important to them, with their 
own criteria. 

Cooperative Examinalion Program of the American Chemi- 
cal Society. The Examinations Committee of the American 
Chemical Society (ACS) has over the years developed an ex- 
tensive effort to assess the various aspects of chemistry. 
Whether the concern is with general chenilstry, biochemistry, 
organic-inorganic, electroanalytical^ or physical chemistry, 
ACS has an instrument designed to measure the student's level 
of competency (Mitchell 1985). 

Some are designed for use with a terminal, one-semester 
course, such as the Brief Physical Chemistry examination. 
Other ACS instruments, such as the Organic Chemistry exami- 
nation, are geared to a full-year curriculum. In turn, the instru- 
ments range in administration time from 75 minutes (for the 
General-Organic-Biological examination, designed for those in 
an allied health sciences program) to 115 minutes (suggested 
for the ACS Examination in Organic Chemistry). 

One ACS offering that should be of special interest is the 
Toledo Chemistry Placement Examination, designed to assess 
the chen-istry background of entering freshmen and then deter- 
mine the level at which they should continue their study. This 
ACS instrument most readily manifests talent development 
value for the institutional user. On a pretest, the results provide 
a measure of individual and aggregate achievement useful to 
decisions ?bout academic placement. When readministered as a 
posttest, the data can indicate both students' progress and in- 
structional impact in inis subject. 

Similarly, pre- and posttest use of other ACS instruments 
^nows students' chanj^e in competence in subject matter. For 
those who view retesting with the same instrument as a demon- 
stration of test mastery rather than change in competency, how- 
ever, consideration should be given to using both a current and 
older form of the test in question. Keep in mind, however, that 
the ACS examination program periodically updates its exams 
and removes older forms fron; circulation. 

Single-subject competency tests. The Ouke University Politi- 
cal Science Information Test (American Govemment), the Har- 
vard-MLA Tests of Chinese Language Proficiency, the Sare- 
Sandcrs American Government and Constitution Tests, the 
Cass-Sandcrs Psychology Test, and the Test of Spanish and 
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Latin American Life and Culture all represent an effort to as- 
sess a single subject, providing institutions with measures for 
these special interests. Some may have been developed in re- 
sponse to the assessment needs associated with a particular 
course, as is the case of the Cass-Sanders test for a first course 
in psychology. Others seem to have been the outcome of a spe- 
cial institutional effort (for example, the Duke University or 
Harvard-MLA venture). With just such assessment initiatives, 
administered before and after, a institution of higher education 
places itself in a position to gauge more directly its impact on 
students' learning over time. Beyond their specific cognitive 
emphasis, however, these instruments point to the possibility of 
institutions' devising their own measures for assessment. They 
demonstrate that schools can design their own instruments for 
areas of special concern to them, especially when adequate as- 
sessment instruments are lacking. Specifically designed assess- 
ment that is geared to a college's or university's actual 
curricular efforts is indeed integral to the talent development 
approach described in this work. 
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INCREASING THE USEFULNESS OF OUTCOMES ASSESSMENTS 



A successful student outcomes proj.^ct not only measures im- 
pact: It also produces impact. The successful project becomes a 
tool for adirinistrators, trustees, facult>', students, and external 
reviewers to use in evaluation and decision making. Yet all too 
often, outcomes assessments fall short of this goa' (Astin 1977; 
Baird 1976; Bowen 1980; Evvell 1983; Weiss 198g. 

The difficulties of applying research findings to curriculum, 
policy, and program development are not unique to higher edu- 
cation. Utilization studies have repeatedly indicated that practi- 
tioners from a variety of disciplines and settings often neglect 
relevant /^search and evaluation data (Ciarlo 1981; Knorr 
1977). In response to such observations, evaluation researchers 
have increasingly turned their attention to the use of assessment 
data in program and policy development (Weiss 1988). Tliis 
section reviews some literature on rse and discusses its applica- 
tion to student outcomes assessment. 

Several aspects of the talent development perspective contrib- 
ute to bridging the gap between researchers and practitioners. 
By rejecting an adversarial approach to evaluation in favor of 
an informational approach, the talent development perspective 
reduces dcfcnsiveness and hostility to evaluation. By emphasiz- 
ing longitudinal designs with pre- and posttesting, talent devel- 
opment assessments reduce the ambiguity of assessment 
findings; researchers and practitioners are more likely to agree 
on the interpretation of the results. Evaluation data arc most 
likely to influence decision making when top administrators and 
researchers agree on the goals of the institution and the goals of 
the assessment and perceive infornnation about outcomes as an 
important source of feedback about organizational effectiveness 
(Weiss and Bucuvalas 1977). The talent development approach 
addresses each of these issues and thereby provides a frame- 
work that researchers, faculty, administrators, students, and 
others can share. 

Before oiscussing more specifically the factors that promote 
or hinder utilization of data about outcomes, vve need to define 
utilization. How do we know whether the i.^search findings 
have been used? If we think of utilization as a continuum rather 
than a dichotomy, then what level of utilization might we strive 
for or accept as sufficient? 

For the most part, researchers ii applied outcomes hope that 
their findings may be "directly translated into political mea- 
sures and action strate^ (Knorr 1977). When this situation 
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occurs, researchers will see :heir recommendations widely read, 
discussed, and adopted. 

While this approach may represent an ideal model, data are 
often used in other ways: 

• To focus attention on an issue or to generate activity re- 
lated to the issue. For example, the recent evaluative re- 
ports on higher education have served a generative 
function by stimuLting discussion and activity about the 
quality of postsecondary education. 

• To delay, substitute for, or legitimate a policy decision. 
Administrators may stall action on an issue by requesting 
a research project to "collect additional information" or 
"make sure all the facts are in." Or ti.e administrator may 
use data about outcomes to support a decision that has 
been made for other reasons. 



Information about outcomes is sometimes most useful in estab- 
lishing a context for decision making rather than in establishing 
the single correct decision (Ewell 1983). "Increased use of 
student-outcomes information often leads to changes in the way 
certain kinds of decisions are approached— in the kinds of alter- 
natives considered, for example— rather than changes in the 
substance of decisions" (p. 48). "What is needed is informa- 
tion that supports negotiation rather than information calculated 
to point out the 'correct' decision" (Cronbach and Associates 
1980, p. 4). 

Research findings are only one of many things that practi- 
tioners typically consider in decision making and planning 
(Weiss 1988; Weiss and Bucuvalas 1977). In the assessment of 
institutionr' performance, data about outcomes are supple- 
mented by a variety of information, including subjective 
impressions, informal interactions, anecdotes, committee re- 
ports and recommendations, reports by external funding and ac- 
creditation agencies, and institutional ratings and reputation. 
Further, while researchers may be convinced of the validity of 
their data relative to other evidence, the administrator may see 
no reasc: to elevate research findings above other sources of 
information. And while researchers often assume that decision 
making within the institution is a rational process, it is in fact 
subjective and unsystematic (cf. Weick 1979). 
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Facilitators of UsefuJ Research 

How then can researchers encourage campus leaders to apply 
data about outcomes in decision making? This section discusses 
a variety of factors that increase the likelihood that data about 
outcomes will be applied to curriculum, policy, or program de- 
velopment. 

Involvement 

The literature on use of outcomes and evaluation shows consen- 
sus on the importance of involving practitioners in research, 
from the initial conceptualization of the research questions to 
the content and organization of the final report. "The greater 
the level of participation of potential users in the various phases 
of the project, the more likely users are to identify with liic 
success of the project'^ (Siegel and Tucker 1985, p. 323). 

Similarly, useful reserrch emerges from an action research 
perspective that requires interpersonal and political as well as 
technical abilities (Buhl and Lindquist 1981). Action research is 
characterized by communication between researchers and key 
practitioners for the duration of a project on outcomes. In addi- 
tion to research skills, the active researcher must develop facili 
tative skills and networking and information diffusion skills and 
must learn about alternative administrative and faculty practices 
(Lindquisc IWl). 

In addition, reporting should be a continuous activity, not 
only the final activity (Cuba and Lincoln 1981). The researcher 
and the target audiences must interact in producing judgments 
and recommendations. 

A review of 20 case studies from the evaluations filed at the 
Office of Health Evaluation of the U.S. Department of Health, 
Education, and Welfare concludes that use strongly depends on 
personal and interpersonal factors (Patton ct al. 1977). If re- 
search is to have an impact, somebody must care about it and 
must have the leadership ability, energy, and commitment to 
ensure that the research receives attention. Institutional re- 
searchers can facilitate this process by identifying key decision 
makers and by working collaboratively with them to provide 
relevant and credible information. 

Involvement of practitioners provides both direct and indirect 
benefits. Among the former are assurance that practitioners arc 
aware of the research project, that the research addresses issues 
of concern to thcni, that the methods used arc credible, and 
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that the results are presented in a format that facilitates use. In- 
volvement of decision makers also provides indirect benefits by 
increasing participants' sense of investment in, or ownership 
of, the project. They will be less likely to neglect a report that 
incorporates their suggestions and concerns, -^nd they will be 
more interested in seeing the project succeed as a consequence 
of contributing to its development. They will be more likely to 
trust the researcher and to perceive him or her as competent for 
having taken the time to consult with campus leaders and re- 
spond appropriately to their suggestions. 

A number of activities can be used to increase decision mak- 
ers' involvement in research (Linaquist 1981). For example, 
participants can be asked to listen to taped interviews and ana- 
lyze them together. Before data analysis, decision makers can 
be exposed to the raw data or to preliminary tabulations and 
asked to indicate the types of analyses they would most like to 
see. Brainstorming sessions can be scheduled after data analysis 
to generate recommendations and discuss the implications of 
the findings. 

The participants in ih^ collaborative process should include 
not only the identified "client" (that is, the administrator or 
department that requested the research) but also all the potential 
audiences for the research, which would probably include a 
range of administrators, program personnel, faculty, and stu- 
dents (Dawson and D'Amico 1985; Deshler 1984; Guba and 
Lincoln 1981; Moran 1987). 

The involvement of pracMtioners in the research process is a 
necessary, but not sufficient, element of useful research. For 
example, such involvement will not be fruitful if stake holders 
in the assessment hold conflicting assumptions and values about 
the goals of the institution or of the assessment. Thus, before 
inv^olving practitioners directly in the design of an outcomes as- 
sessment, the researcher may need to resolve conflicts in value. 

Values 

The choice of outcomes to assess, the instruments used, sam- 
pling and analysis procedures, the selection ot comparison 
groups, and the organization of the final repOit are all value- 
based to some extent. Utilization is enhanced when both practi- 
tioners and researchers accept the same underlying model or 
theory of student outcomes and agree on the importance of as- 
sessing specific outcomes ai^ong particular students in a certain 
manner. 



Research based on models or theories different from those 
held by decision makers is likely to be perceived as inappro- 
priately oriented and therefore irrelevant. "It is important to 
stress that while [outcomes] information . . . should be as ac- 
curate as feasible, standards of accuracy are less important than 
are standards of relevance" (Ewell 1984, pp. 57-58). 

The talent development approach provides opportunities for 
researchers and practitioners to clarify their implicit values and 
beliefs. Discussions among faculty, administrators, students, 
trustees, and legislators about educational and developmental 
priorities are a crucial element in designing assessments of out- 
comes. The resulting longitudinal assessment reflects institu- 
tional values by focusing on the outcomes of most importance 
to those involved in the assessment. 

By involving practitioners in the design and analysis of re- 
search and by clarifying previously implicit \^lues and a'^sunip- 
tions, the researcher is attending to process issues. Process 
threats to utilization can be further reduced by acquiring sup- 
port for the assessment from institutional leaders. 

Support of top administration 

The support of top administrators is often crucial to the use of 
research results. Chief executive officers should communicate 
to their managers and administrators the importance of the proj- 
ect to create a climate on campus that is receptive to the data 
(Forrest 1981). And utilization can be increased when adminis- 
trators offer incentives to those willing to undertake "informa- 
tion-based qualitative improvements in programs and services" 
(Ewell 1984, p, 58). Administrative support has certain advan- 
tages: 

Any effort at dissemination [of research data) is unii\Jy to 
be successful unless the top administration clearly supports 
the project. Strong administrative backing seivcs at least two 
critical functions: it provic^es committee members with an in- 
centive to move ahead witli the pivject and to fmd policy- 
relevant recommendations in the data; and it maximizes the 
chances that recommendations will be put into action (Astin 



Technical factors 

The isJJcs involving process are necessary but not sufficient in 
conducting useful research. Reviews of utilization demonstrate 
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that the quality of the research is also positively associated with 
utilization (Fc rrest 1981; Cuba and Lincoln 1981; Kinnick 
1985). An issue that many researchers have ignored, however, 
is the interaction of technical and political factors, such that 
some research is subject to -extensive methodological criticism 
while other research, sometimes of questionable quality, wins 
acceptance quite easily. Especially in academic settings, techni- 
cal criticisms of research may mask other motives for disre- 
garding the data. 

Interviews with 200 decision makers in mental health admin- 
istration found that quality of research was an important predic- 
tor of use (Weiss and Bucuvalas 1980). Respondents rated 
quality of research as the sinple most important factor in deter- 
nining their own likelihood of using research in decision mak- 
ing but as only the second most important factor (behind 
"action orientation") in determining use by others. Thus, attri- 
butiona! patterns and social desirability may have influenced re- 
•jpondents' ratings. 

The actual importance of the quality of research to its use is 
also questionable, as members of an organization often claim to 
support a rational model of decision making that may have lit- 
tle correspondence to their actual decision-making patterns 
(McClintock 1984; cf. Campbell 1984). 

The perfect study of outcomes has not yet been conducted 
and never will be, and all outcomes research is therefore sub- 
ject to methodological criticism. Probably the best way to avoid 
politically motivated criticism of methodology is to involve po- 
tential critics in the design of the project. Under this approach, 
debates about research methods occur before rather than after 
data collection and analysis, and target audiences are less likely 
to dis».iiss results emanating from a research design they had a 
part in shaping. 

Because the methodological challenges in outcomes research 
have been reviewed previously, a comprehensive review of 
technical factors is not provided here. The literature about utili- 
zation of social science data raises a number of additional is- 
sues for consideration, however. 

First, qualitative approaches can often be a useful supple- 
ment to quantitative methods. Qualitative data provide a 
behind-the-scenes look at statistical data that can render re- 
search reports more interesting and less iniimidating to decision 
makers. For example, case studies are recommended for four 
purposes: to chronicle, to charatierize, to teach, and to test 
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(Guba ana Lincoln 1981). This approach is often dangerous, 
however, because qualitative or anecdotal information may dis- 
tort or misrepresent the actual meaning of quantitative findings. 
A case study of a "usefuF j^rpgram evaluation describes the 
use of an "interactive i-netliodolog;/'' that combined qualitative 
and quantitative f'nta to inform administrative decision making 
(Moran 1987). 

Second, qualitative data alone are generally insufficient to 
satisfy the concerns of target audiences. Data about outcomes 
are most likely to be applied to policy development when ob- 
jective techniques are used (Forrest 1981). An important ele- 
ment is comparative data that allow decision makers to 
compare findings against some meaningful norm or standard. A 
finding that 12 percent of graduating seniors go on to graduate 
or professional school, for example, has more meaning when 
decision makers know that the figure is 24 percent for similar 
schools or war* 8 percent two years ago (cf. Kinnick 1985). 

Third, because practitioners often have diffi culty basing im- 
portant decisions on a single study, survey, or test, convergent 
findings can lead to more confidence in the accuracy of data 
about outcomes. Some writers recommend that researchers 
adopt a strategy of multiple peispect./es (Palola and Lehmann 
1976). This approach has five components: multiple observers 
of students' learjiing, multiple methods of assessment, multiple 
standards for evaluating students' learning, multiple decision 
makers usinft data for a variety of policy issues, and multiple 
lime periods for measuring change in students' learning. In this 
manner, decision makers' concerns about any one approach 
could be reduced by providing convergent or alternative mea- 
sures. Further, the multiple perspectives approach maximizes 
opportunities to apply the research to various policy issues 
within the university. A ''multimodeP' ''jr evaluation research 
is recommended, to include multiple perspectives, levels, meth- 
ods, functions, impacts, reporting formats, and so on (Scriven 
1983). 

A number of researchers with experience in value-added as- 
sessment report on the benefits of multiple measures. "To- 
gether, different kinds of measures of the same outcome 
dimension undoubtedly provide a full picture of the dynamics 
of a particular educational experience" (Ewell 1983, p. 63; cf. 
Banta and Fisher 1987; McClain and Krueger 1985; Ment- 
kowski and Loacker 1985). 
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Dissemination 

The manner in which research findings are dioseniinated signifi- 
cantly influences the extent to which and the manner in which 
the findings are used in decision niai/'' *'v, dissemination 
is an ongoing process of communiCsi \ researchers 

and practitioners. It should be concep ^ a mutual ex- 

change between researchers and targcl audiences rather than as 
a flow of information in ouz direction (from researcher to deci- 
sion mak-r) only. In this way, the final report becomes a prod- 
uct of th, collaboration between researchers and administrators, 
and administrators are therefore more likely to perceive it as 
useful (Forrest 19S1; Cuba and Lincoln 1981). 

Congruent with multiple perspectives, a Vi>riety of methods 
of dissemination can be employed, ranging from informal 
brainstorming sessions to formal, written reports. A number of 
researchers suggest that several different reports should be pre- 
pared, each one tailored to the specific concerns of target audi- 
ences (Ewell 1983; Forrest 1981). 

When communication between researchers and administrators 
has been ongoing and open, the final report will contain no ma- 
jor surprises. Although many researchers believe their data may 
receive more attention if the findings are unexpected, counter- 
intuitive findings are instead likely to be dismissed or ignored 
(cf. Cuba and Lincoln 1981), which is not meant to suggest 
that only findings that confirm decision makers* beliefs or 
knowledge will gain recognition. Rather, unexpc 'ed results 
should be communicated to target audiences at an early stage to 
provide opportunities for decision makers to assimilate the new 
information and avoid defensive reactions. 



Timing 

The timing of reports is a crucial factor in use of results. One 
approach is to release reports when funding decisions are being 
made, as student outcomes may provide information about the 
effectiveness of existing programs, the need for additional serv- 
ices, or the need for program or curricular revisions (cf. Siegel 
and Tucker 198.5). If the study is being sponsored by a campus 
committee or department, researchers must strive to deliver the 
final product on schedule. A possible exception is when other 
events occurring at the time would overshadow the release of 
the report on outcomes; under such conditions, the researcher 
might wait untM the audience(s) would be more likely to pay 
attention to the findings (cf. Siegel and Tucker 1985). 



Recommendations 

Some disagreement emerges in the literature regarding the risks 
and benefits of prtviding recommendations for action based on 
research findings ai opposed to simply presenting the data and 
allowing practitioners to develop their own recommendations. 
Not surprisingly, recommendations for incremental changes 
have met with less opposition from policy makers than reconr 
mendations for fundamental chang-.-s. 

I?i some instances recommendations that state goals (ends) 
are more effective than those that delineate specific courses 
of action. This provides direction to users while permitting 
them considerable latitude in selecting ways of achieving the 
goals of the recommendations. Also it is oftentimes easier to 
achieve a consensus around ends rather than means. Parties 
asked to make changes are usually more willing to do so if 
they retain some control over how these changes will be re- 
alized (Siegel and Tucker 1985, p. 316). 

Further, researchers should make clear the connection between 
their recommendations and their data. To the extent that recom- 
mendation? are perceived ?».s politically based rather than data 
based, decision makers are less likely to use them. 

Other researchers, however, have found that utilization was 
positively associated with reports that contained explicit recom- 
mendations for action— have, in fact, found a positive associa- 
tion between reports that challenge the status quo and 
utilization of research by decision makers (Weiss and Bucuva- 
las 1980). 

Cuba an'^ Lincoln (1981) suggest one way to understand 
these different findings. Whereas Siegel and Tucker implicitly 
assme that researchers develop recommendations indepen- 
den' y and then provide them to decision makers, Cuba and 
Lincoln suggest that researchers develop recommendations in 
collaboration with decision makers. Under these circumstances, 
target audiences might more positively receive explicit recom- 
mendations for action or recommendations of a more funda- 
mental nature. 

Another approach to developing useful recommendations 
suggests that time constraints often force researchers to develop 
recommendations without a full consideration of the possible 
alternatives for action suggested by the data (Roberts-Gray, 
Duller, and Sparkman 1^87). Rather than leave recommenda- 
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tions to the end of the research process, perhaps researchers 
should write recommendations during resf^arch design, using a 
"whai if approach. 



By thinking at the beginning about recommendations that 
may be made at the close of the evahiation, the evaluator 
helps ensure that evaluation results v/ill coni/.bute to pro- 
gram improvement. . . . Tlie logic linking data with action is 
spelled out and easy to trace . ... It can show where addi- 
tional data are needed and identify areas ere data thought 
to be needed would be useless in fact (Roberts-Gray, Duller, 
and Sparkman 1987, p. 681). 

Report format 

The format of the report is another factor related to utilization. 
Several researchers recommend that reports be organised 
around issues rather than methods (DeLoria and Brookins 1984; 
Ewell 1983; Forrest 1981; Kinnick 1985). Reports should di- 
rectly address practitioners' concerns— which may require writ- 
ing several reports or memos, each focusing on a different 
issue. Reports should be brief, avoid research jargon, and use 
graphics to summarize and display major findings (Kciiest 
1981; Cuba and Lincoln 1981). 

A set of useful recommendations abc ut writing research 
reports for decision makers suggests that the traditional 
"dissertation-style'' approach may be inconvenient for decision 
makers because "the details needed to answer a single policy 
question may be scattered across several chapters'' (DeLoria 
and Brookins 1984, p. 648). The time and effort required to 
locale and integrate relevant information may deter use of the 
report. 

As an alternative to the traditional approach, researchers 
should prepare two reports— one scientific and one policy 
(DeLoria and Brooki:. ; 1984). The latter would be brief, orga- 
nized around major policy questions, and in the language of the 
practitioner. Reports that get used in decision making have the 
following characteristics: 

1. The questions addressed are clearly linked to real policy 
decisions. 

2. At least some questions in each report consider the costs 
affecting policy. 
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3, Policy questions form the central oi-ganizing theme of 
the report, 

4, Tlie reports describe enough of the policy context to per- 
mit infonned interpretation without oiitsidj sources. 

5, Evaluation methodology is played down, 

6, Reports begin with a brief summary of the essential find- 
ings, 

7, Backup narrative for the executive summaiy is 
''chunked'* into easily located, brief segments thiough- 
out the body of the report, 

8, Only simple statistics are presented, 

9, Wliere jargon is used, it is the Jargon of the practition- 
ers, not of the evaluators, 

10, Concrete recommendntions for action are based on spe- 
cific findings (DeLoria and Brookins 1984, pp. 660-62). 

Within higher education, researchers must walk a fine line 
be^veen turning off their audience by being too technical and 
turning off their audience by being too simplistic. Especially 
when professors trained in research will be reading the reports, 
detailed information about sampling, design, and analysis may 
be desirable to establish the validity of the methods employed. 
This technical information, however, should be provided in an 
appendix or self-contained chapter, with the most important in- 
formation repeated in other ^'•jctions that are devoted to major 
questions of research and policy. 

Structures and settings 

The ideal setting is one in which decision makers can jointly 
review and discuss the research data. Committees associated 
with the n»ajor campus issues provide opportunities for consid- 
eration of research findings and implementation of recommen- 
dations. Open forums could be held as well to encourage a 
broad range of students, faculty, and staff to discuss the find- 
ings. Or top administrators might sponsor a retreat for adminis- 
trators to review the data and brainstorr ^bout its implications 
for action. 

Special events created specifically to .insider the research on 
outcomes have the advantage of emphasizing administrative 
commitment to the project and of providing a setting in which 
the project is a primary (or exclusive) focus. On the other 
hand, when discussion of. the findings is integrated into ongo- 
ing committees or task forces, the research on outcomes may 
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come to be perceived as relevant to day-to-day decision making 
and an integral aspect of "management systems." 

The importance of organizational factors can be summarized 
as follows: 

One way of increasing the likelihood that student outcomes 
infonnation will be used by decision makers is to put the in- 
formation in a form suited to some of their regular activities. 
For most decision makers, student outcomes infonnation falls 
into the category of ''nice to know'' rather than ''need to 
know, " Outcomes information is much more likely to be rec- 
ognized as relevant if it is not seen as distinct from the lands 
of productivity infc viation upon which most' dec isiun makers 
claim to base their findings (Ewell 1983, p. 48). 

Barriers to Use 

This section describes additional factors that hinder utilization. 

Gap between researchers and practitioners 
While researchers traditionally strive for objectivity and neutral- 
ity, advocacy is an important element in the administrative 
role. And while researchers may prefer complex methods and 
an extended t:'"^ . frame for data collection and analysis, deci- 
sior makers require information that can be quickly obtained 
and easily assimilated. These and otLer differences between re- 
searchers and administrators may lead administrators to per- 
ceive research data as irrelevant in their decision making (zt 
Caplan 1977; Siegel and Tucker 1985). One possible approach 
to this problem is for researchers to have the foresight to build 
data bases that ultimately will provide a resource for getting 
rapid and sophisticated answers to complex questions. 

Although the goals of most colleges and universit.es include 
the support of research activities, administrators may fail to 
perceive these activities as useful in meeting their own needs 
for information. Therefore, the researcher must educate admin- 
istrators about the potential benefits of research and -nust re- 
spond to the values, language, and jjoals of target audiences. 

The institution's decentralized structure 
The benefits of using data available on campus have been dis- 
cussed in previous sections. This task may be rendered difficult 
when relevant data elements are located in different sites on 
campus and when the data are collected or processed in such a 
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way that it is difficult to merge with other information (Kinnick 
1985). Some additional expenses could be incurred as well if 
data must be recoded or rekeyed. Such pivolems can almost al- - 
ways be solved if sufficient time is allowed and if top adminis- 
trators commuFif'^ate the importance of the effort to those who 
manage the data. 

Another problem can arise from the decentralized nature of 
the university: The decentralized structure of most schools 
means that no one office or department is responsible for stu- 
dent outcomes (Ewell 1983). Support from top administrators, 
especially incentives for collection or application of data about 
outcomes, can overcome this barrier. 

Faculty resistance 

Resistance from faculty is often cited as a reason that assess- 
ments of outco!r»es are inappropriate for a particular institution 
(Ewell 1985). 1 acuity may fear a negative evaluation or may 
believe that assessments of outcomes will not accurately mea- 
sure the educational process. Recent research (Astin and Ayala 
1987) suggests that resistance from faculty is a nonnal part of 
any attempt lo implement such assessments. It is to be expected 
but it can be effectively dealt with. Barriers erected by faculty 
can be overcome by involving faculty in the research, by dif- 
ferentiating assessments of outcomes from teaching evaluations, 
and by using multiple h.sasures to compensate for the limita- 
tions of individual instruments. 

Cost 

Even when decision makers belie"e in the value of such assess- 
ment, institutional research is one of many programs competing 
for limited funds and administrators may be unable or unwilling 
to financially support the research program. Again, support 
from top administrators and early education and involvement of 
key audiences increase the likelihood that the assessment will 
be funded. Costs can be reduced by using data already avail- 
able on campus (Ewell 1985). 

Timing and follow-through 

Late delivery of research is among the most common reasons 
for data's underuse (Kinnick 1985). This situation should be 
avoided at all costs, because it n. t only reduces (or eliminates) 
the usefulness of the current project but also decreases the like- 
lihood of decision makers' support for future projects. 
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Underuse may also result if the researcher fails to conduct 
any follow-up activities after the final report is released. Such 
activities may take many forms— releasing additional memos 
and analyses, requesting feedback from target audiences, or 
participating in implementation activities, for example. Without 
such activities, decision makers are likely to be distracted by 
other, more v'sible issues, and findings will be neglected. 

Academic games 

A number of "academic games'* can be observed in committee 
meetings at most colleges— rationalization, passing the buck, 
obfuscation, co-optation, recitation, and displacement/projection 
(Astin 1976). One of the purposes of such games is to relieve 
committee members of responsibility for action; the games in 
this way act as barriers to utilization. Researchers can use both 
direct and indirect approaches to end the games and maintain 
control of the discussion. 

Paradoxes of Guidelines for Utilization 
Applying the information provided in this monograph poses 
several challenges, including reconciling recommendations that 
appear to be contradictory. This section briefly describes some 
of these apparent conflicts. 

Rational versus irrational decision making 
Applied research assumes that decision making is rational—that 
administrators assess situations, identify problems, generate and 
evaluate potential solutions, and implement the "best" alterna- 
tive. In reality, however, decision making may proceed along 
highly subjective, unsystematic, an^ :ven irrational lines (Mc- 
Clintock 1984; Weick 1979; Weiss 198^). Under such circum- 
stances "legitim dng'' uses of research may oe more likely to 
occur than "instrumental" uses. Researchers must weigh the 
risks of their research being misrepresented or distorted against 
the risks of its being ignored altogether. 

Involvement versus control 

The emphasis of action research on involving target audiences 
in the research process ma^ threaten the traditional objectivity 
and neutrality of researchers. As researchers try to understand 
and appeal to the values of decision makers, they risk "co- 
optation'' (Dawson and D'Amico 1985). Similarly, the re- 
searcher walks a thin line between profiting from the involve- 




ment of decision makers ond losing control of the project 
(Siegel and Tucker 1985). Opponents of a project might criti- 
cize the research as partisan if researchers have worked too 
closely with target audiences. 

Democrccy versus competition in decision making 
The "democratic" decision-making process characteristic of ac- 
tion research may conflict with competitive norms found in 
many colleges (Buhl and I.^ndquist 1981). The participatory 
process recommended by most action researchers will be inef- 
fective if decision making is perceived as a competitive situa- 
tion in which one person wins and another loses. When such 
norm*^ firmly entrenched, the researcher must strive to cre- 
ate a safe setting for open discussion with target audiences. 

Involvement ^^ersus timeliness 

While the benefits of involving target audience? in the lesearch 
have been discussed at length, it should be pointed out that 
such a process may significantly slow down the progress of the 
research. It takes time to schedule meetings, to consult with 
various stake holders, and to respond to their feedback. Fur- 
ther, most decision makers are very busy people who can in- 
vest only a limited amount of time in the effort. Because late 
delivery of data is a major barrier to utilization, the researcher 
must either be prepared to start the process early or balance 
time pressure against political pressure. 

Methodological rigor versus time and cost 
While comparative, longitudinal studies that conform to estab- 
lished standards of quasi-experimental design, use multiple 
measures, and supplement quantitative findings with qualitative 
data are desirable, they are also expensive and time consuming. 
The obvious rejoinder to this objection is that assessments that 
fail to accurately respond to the research questions are hard to 
justify, regardless of their cost or "efficiency." Researchers 
may have to decide whic'i methodological tradeoffs are least 
damaging, however (cf. Cook and Campbell 1979). 

Technical credibility versus readability 
The brief reports recommended by many researchers (for exam- 
ple, Ewell 1983; Forrest 1981; Palola and Lehmann 1976) oo 
not include room for detailed descriptions of research methods. 
Academic audiences, however, may be unwilling to accept the 
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findings without such information. The additional bulk created 
by including this information, on the other hand, may deter de- 
cision makers from reading the report. 

Researcher's objectivity versus advocacy 
Action research places the researcher in the role of advocate as 
well as technician, although wrUers disagree about the most ef- 
fective methods of advocacy (see the previous discussion on re- 
search recommendations). Institutional researchers must face 
another Onemma: If they act as advocates in one situation, 
might that limit their credibility in another? When researchers 
become politically active, will decision makers trust their infor- 
mation on a continuing basis? If researchers decide not to enter 
the political arena, however will their data be misrepresented 
or neglected? 
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PRACTICAL SUGGEdflONS FOR CONDUCTING ASSESSMENTS 



This chapter proposes a nun ber of practical suggestions for im- 
plementing a comprehensive program of assessing outcomes. 



Be Informative Rather Than Adversarial 
As suggested earlier, a program of institutional outcomes as- 
sessment is likely to be useful if it is based on a talent develop- 
ment approach to excellence rather than the traditional resource 
and reputational approaches, which are inherently competitive 
and therefore adversarial: Who has the brightest students? Who 
has the most prestigious faculty? Who has the larger i library? 
This competitive approach is further reflected in the ways tradi- 
tionally used to assess students: letter-grade averages, relativis- 
tic measures that pit students against each other. Students are 
thus tested and graded to determine whether they should be ad- 
mitted, awarded credit, or permitted to graduate rather than to 
determine how much and how well they actually learn. This at- 
titude also spills over into attempts to assess faculty members: 
Most assessments of faculty performance are designed to deter- 
mine whether they should be hired, promoted, or given tenure. 
Under such conditions, the institution's assessment program is 
bound to be perceived as a threat. Further, this adversarial view 
of assessment tends to put students and faculty members into 
passive roles: Students and professors submit to asseb^ment and 
try to show themselves in the most favorable light possible. 

By contrast, the talent development concept and its associ- 
ated notion of involvement demand a very different purpose for 
assessment. In this case, assessment is used primarily for feed- 
back to increase the involvement of students and facultj mem- 
bers and to develop their talents as completely as possible. 
Such assessment is active rather than passive, as it is designed 
to facilitate and improve performance rather than merely to 
evaluate it. Furthermore, the information gathered is used to 
benefit the parties involved rather than to pass judgment on 
them. 



In general^ 
the authors 
recommend 
that 

researchers 
use a 

combination 
of established 
and locally 
designed 
instruments. 



Build on What You Already Have 

The talent development approach to ass^3sment does not neces- 
sarily require that institutions embark on an entirely new pro- 
gram of testing and evaluation. For example, most institutions 
alieady employ some kind of testing program f^r admissions, 
and many also i:se various types of placement tests. Under a 
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talent development approach, these admissions and placement 
tests can be viewed as a kind of "pretest" for subsequent 
follow-up assessments ("posttests") that could provide a longi- 
tudinal measure of change or growth in students' competence. 
At the same time, upper-division competency tests in writing, 
basic academic skills, and related areas, which have become in- 
creasingly popular and have even been mandated in some pub- 
lic institutions, might provide an important "posttest" that 
could be "pretested" with the same or similar device at the 
time the student enters the institution. These pretests on upper- 
division competence, incid.-.ntally, can also be important guides 
for effective placement and cyjnseling. Indeed, it may well be 
that pretesting students with upper-division tests of competence 
at the time of entry could replace currently used admission: or 
placement tests, thereby obviating the need for any increase in 
the amount of assessment. 

But perhaps the most important existing assessments to be 
elaborated into a talent development context ?re classroom ex- 
aminations. Most undergraduate couises involve some kind of 
fina! examination, and many also involve midterm examina- 
tions of various types. In most courses, these same (or parallel) 
exams could be given lo the new student at the time of initial 
enrollment in the course, thereby providing a baseline against 
which to measure change in the midterm and final examina- 
tions. An important additional benefit from such pretesting in 
the classroom is ^hat it gives students a very concrete idea of 
what is to be expected in the course and of how much growth 
the student must demonstrate to reach acceptable standards of 
performance. 



Start SimDly 

For institutions that have not already established a tradition of 
comprehensive assessment, it is important to initiate any new 
outcomes assessment modestly with minimal disruption of insti- 
tutional activities. A more comprehensive and complete system 
can evolve from these modest beginnings. 

Institutions often resist comprehensive assessments of stu- 
dents' cognitive development because of their high costs and 
logistical problems. This resistance can be compounded by the 
fact that such assessments require a substantial lapse of time 
between pretest and posttest before any useful information on 
students' growth and development is obtained. One short cut 
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that can provide useful information almost immediately is the 
use of so-called ''surrogate" measures of students' cognitive 
development. Thus, questionnaires can be administered to stu- 
dents that involve three kinds of questions: (1) self-reports 
about how much students think they have actually improved 
their skills and knowledge in various areas (a kind of quick- 
and-dirty value-added assessment after the fact); (2) student rat- 
ings of a wide range of university experiences and services, in- 
cluding classroom teaching, counseling, residential facilities, 
and so forth; and (3) a "time diary' in which students provide 
information about their level of involvement in various activi- 
ties by indicating how much time they spend on studying, dis- 
cussing class subject matter with students and faculty, and so 
on. All three can be obtained from a single questionnaire ad- 
ministered to students at any time during their undergraduate 
years. The results of such assessments can be analyzed rapidly 
and disseminated to faculty, staff, students, and others who 
have an interest in the results. A national program that pro- 
duced normative information on such matters is the Follow-Up 
Student Survey (FUSS) conducted by the Higher Education Re- 
search Institute at UCLA. 

Develop a Data Base 

As noted, it is important for an institution that does not have a 
well-etitablished tradition of longitudinal student assessment to 
begin to develop such a system modestly. A minimally useful 
student data base should incorporate the following core ele- 
ments: 

1, Successful completion of a program of study. In its sim- 
plest form, this measure would involve a dichotomy: The 
student either completes a program or drops out* More 
sophisticated approaches to this measure would involve 
determining whether a student's undergraduate achieve- 
ments are consistent with his or her degree plans at col- 
lege entry. 

2. Coffiitive development. The basic purpose of this cate- 
gory of information is to determine whether the institution 
is achieving its basic instructional purpose* develop its 
students' cognitive abilities. Again, the ''su >gate" mea- 
sure of longitudinal cognitive development— the student's 
self-repcrt of learning in various subject areas— would 
seem to be a modest way to start. Ultimately, of course, 
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it would be important to incorporate actual assessments of 
pretest and posttest cognitive performance in areas that 
are relevant to the curricular program. 
3. Students* involvement and satisfaction. Students' satisfac- 
tion with the institution's program is one of the most im- 
portant indications of an institution's effectiveness. 
Students should be asked not only about their overall sat- 
isfaction but also about their satisfaction with more spe- 
cific matters: the quality of teaching, advising, 
curriculum, facilities, extracurricular activities, and stu- 
dent services. Perhaps the best way to assess involve- 
ment, as suggested earlier, is to ask students to keep time 
diaries indicating how much time (per week, for example) 
they spend on various activities (studying, interacting 
wi!h each other and with professors, working at an out- 
side job, engaging in athletics and other activities, and so 
forth). 

Be More Absolute^ Less Relative 

Almost all of the widely used aptitude and achievement tests in 
higher education tollov. a similar practice in (est design and 
construction: A list of multiple-choice test items is developed 
and administered to a sample of students. The number of items 
answered correctly (possibly with adjustments for wrong an- 
sv rs) is calculated for each studen. and then converted into a 
derived measure, such as a standard score or a percentile score. 
This process of conversion basically wipes out the fundamental 
information about how many or what percent of questions the 
students answered correctly, which questions were answered 
correctly, and so on. Instead, it provides only relativistic infor- 
mation derived from the normal curve, that is, reflecting only 
how one student has performed relative to others. 

While such relative measures are used almost universally in 
large-scale national and state examinations, they present some 
potentially serious problems. Besides indicating nothing about 
the student's absolute level of performance, such relativistic 
scores give no information about how difficult the items were 
or what the student's test performance implier, about potential 
for performing well on the job, profiling from further educa- 
tion, and the like. More important, si.ch relative measures offer 
no way of reflecting changes in the student's performance over 
a period of time. Thus, it is possible for a student's absolute 
(actual) level of performance or competence to improve consid- 
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erably over a period of time, while his or her relative perform- 
ance remains the same or even declines during the same period. 

Measures of absolute performance can be developed from the 
types of muhiple-choice items usually found in aptitude or 
achievement tests in several possible ways, but perhaps the 
most straightforward approach is simply to record the number 
of items .answered correctly. Change or growth in the student's 
development can thus be assessed in terms of increases in the 
number or percentage of such items answered correctly. One 
useful elaboration of this approach is to develop ex-pectancy ta- 
bles that show the probability of various events (graduating on 
time, graduating witli honors, performing well on the job, and 
so forth) as a function of the number of items correctly an- 
swered. Change or growth can then be measured in terms of 
increases in these probabilities over time. 

Another method is to label particular points on the distribu- 
tion of scores (whether they be raw or derived scores) in terms 
of the level of performance typical of that point. For example, 
if one were inte-^sted in using an outcome measure of writing 
skill to certify students for graduation, the lowest scores might 
indicate borderline literacy, and the highest scores might cor^-e- 
spond to the level of writing competence required of students 
pursuing a doctoral-level graduate education. The significance 
of the scale points would be made even clearer if examples of 
actual items were used to show the most difficult types of items 
passed by the majority of people scoring at a particular point 
on the scale. 

Get More from Vour Standardized Test > 
Given the heavy use of standardized tests by most colleges and 
universities, it is unfortunate that so little of the information 
collected in these tests is actually used for educational purposes 
to enhance students' talent development. One way to enhance 
the educational usefulness of such instruments is to obtain in- 
formation concerning the student's raw scores as well as stan- 
dardized or derived scores. Such information is readily 
available from the testing organizations and should be requested 
by all institutions thai use these tests for admissions, place- 
ment, or other pr»rposes. 

Another potentially more important type of information is the 
students' performance on individual test items. If it were possi- 
ble to know how students perform on individual items— which 
ones they find most difficult and which ones they find rela- 
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tivcly easy— such information could be invaluable in planning 
and evaluating the curriculum. 

Testing companies have resisted providing information on 
perfonnance on individual test items on the grounds that such 
information is "unreliable." This objection may be valid in the 
case of individual students, but it is not relevant to information 
that could be provided in aggregate form. That is, it would be 
extremely valuable to faculty members to know how a group of 
students performed o- each test item. Again, testing organiza- 
tions should be able to provide such information at relatively 
little additional cost. 

Another objection to the provision of data on performance on 
individual test questions is the need to protect test security. 
This argument is really a weak one, given the theory underly- 
ing the construction of most achievement tests. Briefly, the 
items for such tests are selected from a hypothetical "domain" 
of all possible test questions that could be asked about the par- 
ticular subject in question. If providing feedback on individual 
»tems to institutions violates the security of a particular set of 
items, then the test company can simply write new items each 
year. If the domain is finite, then once all possible test items 
have been writteii and made public, the test makers can sample 
randomly from this domain in constructing a new test each 
year. It might be argued that under these conditions professors 
will encourage their students to study for the test by learning 
the an/ ^vcrs to all the items. But what is wrong with this ap- 
proach? If a student knows the answers to all possible questions 
that could be asked about a particular body of knowledge, then 
that student knows, by ^iefinition, that body of knowledge. 

By obtaining access to results for individual test questions on 
an instrument like the College F <rance Examination Board's 
SAT, institutions can then reptjt some of these tests after one, 
two, or four years to measure improvement in students' per- 
formance on specific items. Moreover, if testing organizations 
could be persuaded to perform equating studies where various 
instmments such as the SAT and GRE are equated, the results 
of these tests could also be used to measure improvement in 
cognitive performance during the undergraduate years. 

While such changes in testing organfzations' feedback may 
be difficult for an individual institution to achieve, it should not 
be too 'fficult for several institutions that are members of re- 
gional associations or possibly public systems to join together 



to request such modifications from testing organizations. Under 
such pressui a good likelihood exists that testing organiza- 
tions will provide the requested data. 



Combine the Use of I/- al Assessment Instruments and 
Standardized Instn:n!t ts 

Even though test development can be an expensive exercise, lo- 
cally designed tests can often provide information with most 
lelevance to practitioners (Baird 1976). An important advantage 
of locally designed assessments is the "ownership" that comes 
from the involvement of faculty and staff in development, espe- 
cially true in the case of departmental comprehensive examina- 
tions in the major, which presumably cover the subject matter 
closest to the heaits of the faculty. 

Nationally developed instruments generally have the advan- 
tage of established reliability and validity. In many cases, na- 
tional norms are available, providing additional opportunities 
for comparison. Longitudinal trends may be available as well, 
providing information about change over time in students' ca- 
pabilities. 

The decision about whether to design one's own instrument 
or use an already existing one cannot be made independently of 
the goals for assessment. If the goal of the assessment is to sat- 
isfy concerns about accountability for an external review, use 
of an already existing instrument would probably be most ap- 
propriate. The established validity of the test would lend legiti- 
macy to the assessment, and the opportunity to compare 
institutional results against national norms might be particularly 
important. On the othur hand, outcomes assessments for institu- 
tional self-improvement might require specific information that 
an established instrument is unable to provide. 

In general, the authors recommend that researchers use a 
combination of established and locally designed instruments. 
The former are often already available on campus (for example, 
SAT and GRE scores and placement tests). This information 
could be supplemented with additional data derived from sur- 
veys tailored to the research questions. 

Exchanging and sharing locally designed instruments among 
institutions offer several advantages. First, a better quality in- 
strument is obtaine^"* because the researcher can benefit from 
the experience of colleagues at other schools. Psychometric in- 
formation can be obtained from the institution that originally 



ERIC 



^""^e Student Outcomes Assessment 



developed and used the instrument. Second, some comparative 
data may be obtained if institutions are willing to share their 
findings. 

Be Opportunistic 

The practical problems involved in large-scale institutional as- 
sessment of students' competence are not to be underestimated. 
The time of students and faculty members is at a premium in 
most institutions, and any additional assessments that would be 
required to implement the talent development approach should 
be incorporated with minimal intrusion on the time and energy 
of faculty and students. 

Many institutions fail to realize that once students begin to 
attend classes it may be extremely difficult to find a way to 
conduct pretest assessments. It is important to realize that the 
student who is in the process of matriculating for the first time 
is generally in an extremely cooperative frame of mind and 
therefore an ideal subject for pretest or placement assessments. 
It thus makes good sense to capitalize on this opportunity as 
fully as possible and to include as many assessments cs might 
be needed for a full-fledged program. Follow-up assessments 
are almost inevitably more difficult, as students may never 
again congregate in a single place at the same time and in the 
same cooperative frame of mind. Some institutions may well 
find it necessary to mandate follow-up posttest assessments. In 
those states where some kind of mandatory upper-division com- 
petency assessment (such as the writing requirement at the Cal- 
ifornia State University) is already in place, then this posttest 
assessment might also be seen as an occasion to include other 
posttests where appropriate. In short, institutions should attempt 
to identify those points in the students' institutional experience 
where assessments are likely to be least intrusive and most ac- 
ceptable to the larger academic community. 

Use Gentle Persuasion 

As already noted, follow-up assessments of cognitive outcomes 
are frequently difficult to conduct if they are not mandated by 
external agencies or by institutional policy. Requiring all stu- 
dents to participate in outcomes assessments has several ob- 
vious advantages. First, the risk of distorting findings because 
of a biased sample is minimized. Second, more statistical 
power is gained in analyses of results from the larger number 
of respondents. Voluntary participation in the testing may lead 



to a large amount of attrition from the project (in addition to 
attrition from the institution) that can substantially reduce the 
size of a posttest sample. Finally, required participation of all 
students avoids issues of equity that may be encountered if 
some but not all students are asked to invest their time in the 
project. 

On the other hand, required participation raises both logisti- 
cal and ethical issues. While pretesting can be implemented rel- 
atively easily during freshman orientation, it may be difficult 
and expensive to schedule posttesting sessions for all students 
in a cohort. Further, administrators and faculty may question 
the desirability of required testing if the benefits of outcomes 
assessments have not yet been unambiguously demonstrated. 
And students may object to forced participation in posttesting 
on grounds ranging from lack of time to invasion of privacy. 

When the institutional envisonment does not favor required 
outcomes assessments, researchers can choose among many 
strategies for increasing voluntary participation: 

• educating students about the benefits that will accrue to 
them as a result of their participation; 

• appealing to students' sense of citizenship or educating 
them about the benefits that will accrue to subsequent 
classes as a result of their participation; 

• providing incentives for groups or individuals to partici- 
pate (positive reinforcement); or 

• offering release from some other responsibility in ex- 
change for participation (negative reinforcement). 

The best results might be obtained from a combination of 
these approaches. Educative approaches typically emphasize 
either applying the research findings to institutional self- 
improvement or pioviding individual students with their own 
scores and/or aggregated results as a means of increasing in- 
sight into their own development. Where feasible, students 
might also be offered counseling to aid in interpreting and ap- 
plying the results of test scores. 

Possible incentives include a broad range of rewards for par- 
ticipation. Among those that some institutions have used are 
cash prizes, a chance to win a larger prize in a random drawing 
of participants, discounts on campus services, tickets to cultural 
events, and T-shirts. Student groups might also be invited to 
compete for some prize, awarded to the group with the largest 
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number of participants in the testing. Students' participation 
can also be obtained by allowing students to substitute partici- 
pation in the outcomes assessment for some other required task. 

In the short run, incentives are most effective in obtaining 
participation. Educative appeals may be more effective in en- 
suring a sufficient posttest sample, however. If students find 
the incentive for the pretest inadequate compensation for their 
participation, they are unlikely to return for follow-up testing. 

Involve Faculty from the Start 

Several benefits are obtained by including key faculty members 
in all stages of the research. First, faculty can serve as techni- 
cal consultants to the research project. For example, they can 
be asked to develop or review assessment instruments, and they 
can offer guidance on research design and analysis of outcomes 
assessments. In this context, faculty provide a pool of experts 
from which practitioners may draw specialized assistance in 
conducting an outcomes assessment. 

A second benefit is that faculty support will increase when 
opinion leaders are actively engaged in the project. Practition- 
ers will be better able to respond to faculty concerns about out- 
comes assessment when such concerns are expressed at the 
earliest stages of the project. And faculty are less likely to re- 
sist a project to which they have made substantial contribu- 
tions. 

For example, faculty commonly resist outcomes assessments 
because they believe that the assessment instruments fail to ap- 
propriately define the major concepts and methods of their dis- 
ciplines. Under such circumstances, faculty could be invited to 
design their own assessment instrument for measuring students' 
competence. This approach both increases the validity of the 
research instruments and reduces faculty objections to the proj- 
ect. 

Finally, active involvement of faculty will reduce concerns 
that such evaluations will be used punitively— either to identify 
"bad" teachers or to weed out "bad" students. Their partici- 
pation on the project will reinforce administrators' promises 
that the evaluation will not be used in such ways. 




CONCLUSION 



Summaiy and Review 

We have discussed the potential of student outcomes research 
to inform decision making and thereby improve postsecondary 
education. We have pondered the reasons for the frequent 
failure of outcomes assessments to realize ihis potential. We 
believe that we can do better in the future. To this end, we 
have described a variety of approaches to and instruments for 
assessment and have provided suggestions for ensuring that 
outcomes assessments will be both methodologically sound and 
relevant to the interests of institutional leaders and decision 
makers. We have further suggested that the talent development 
perspective provides a theoretical perspective and a method- 
ological frairtework that enhances the usefulness of outcomes data 
for improving postsecondary education. 

The reaction of researchers and practitioners in higher education 
to the oft-heard call for longitudinal assessment is frequently a 
resounding "yes, but." It is difficult, especially in today's 
educational climate, to be "against assessment," but we can all 
recite a growing number of reasons that we have heard for why 
assessment is ill advised within a particular institution at the 
present time (cf. Ewell 1984): 



The authors 
will develop 
an 

information 
base that will 
greatly 
increase the 
validity 
and cost 
effectiveness 
of assessment. 



• It's impossible to measure what really matters to us. 

• We lon'i .ave the time or the money to implement these 
ideas. 

• We don't want our faculty to "teach the test." 

• The state/administration/other would misuse the 
information. 

• We'd never get our students to cooperate. 

o We'd never get support from the leadership of this 
institution. 

• We'd never get the faculty to cooperate. 



It would be a mistake to dismiss them as convenient excuses 
for avoiding the technical and practical challenges of assessment. 
In fact, each concern highlights important issues for consideration 
in planning a talent development assessment program. Effective 
assessment programs require that conceptual, methodological, and 
political issues be addressed. To assist readers in determining their 
readiness to implement assessment programs and to briefly review 
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the material presented in this monograph, the following quick 
"self-study" guide is offered: 

L Has this institution identified its primary goals for an 
outcomes assessment? Is the assessment primarily to 
satisfy external demands for accountability or to promote 
institutional self-improvement? What is the institution's 
commitment to a program of longitudinal assessment of 
student development? 

2. Has this institution developed a coherent philosophy of 
institutional mission? Do faculty, administrators, and 
students share a concept of "excellence" that can be 
used to establish more specific educational goals and 
policies? 

3. Based on the educational philosophy, mission, and goals 
of the institution, what outcomes are most important to 
assess? Has the institution developed operational defini- 
tions for these outcomes? 

4. What standardized instruments are currently administered 
to students for placement, diagnosis, or evaluation? Can 
these tests be readministered to obtain useful information 
about students' growth and development? 

5. In addition to those currently in use, are other standardized 
instruments available to measure outcomes of interest? If 
so, do test vendors define the concepts measured in a 
manner that is congruent with the goals and interests of the 
institution? 

6. What are the trade-offs within this institution in using 
standardized instruments (if available) versus developing 
assessment tools internally? How can these approaches 
be combined? 

7. How can the assessment program best complement and 
extend ongoing efforts at assessment within the institution, 
such as placement tests or upper-division competency 
tests? 

8. For any particulaj instrument under consideration, are 
students likely to bottom out on the pretest or top out on 
the posttest? Are scores available on individual items as 
well as for total scores? Are individual scores valid in 
addition to aggregate scores? Are absolute measures of 
performance available in addition to relativistic measures 
of performance? Do the instruments have established 
reliability and validity? Are longitudinal or cross- 




sectional comparisons available or potentially available? 
If instruments are commercially available, do vendors 
provide space for optional, locally designed questions? 
9. Is the institution prepared to administer the assessment 
instruments within the framework of accepted standards 
of field research (quasi-experimental design)? That is, 
has the institution identified and secured the cooperation 
of appropriate comparison groups? Is the institution 
committed to a design using pretests and posttests? 
Which potential threats to internal and external validity 
are of most concern, and how might these threats be 
minimized? 

10. Should students' participation in the assessment program 
be required or voluntary? If voluntary, how might 
students' compliance be secured for both the pretest and 
the posttest? How might voluntary participation influence 
the external validity of the assessment? 

11. WTiat possible side effects of assessment for students 

^^'^Id be anticipated (for example, psychological distress 
iated with bottoming out)? When are the most 
advantageous times to administer stuuent assessments? 

12. Has the research team secured the involvement and 
support of key stake holders and target audiences (from 
faculty, administrators, staff, and students) at all phases 
of the project? Do top administrators support the assess- 
ment? Are they committed to using results of assessment 
in evaluation, curriculum or program development, 
planning and policy development, or other forms of 
decision making? What reasons might faculty, 
administrators, or students provide for delaying or 
discounting the assessment program? Has the research 
team responded appropriately to these concerns and 
involved potential critics in the planning process? 

13. Is the research team prepared to complete assessment and 
analysis expeditiously and provide specialized, issue-based 
research reports or presentations that respond directly to the 

. needs of target audiences? 

14. Is it advisable for the research team to provide recom- 
mendations? If so, what approaches will ensure that the 
recommendations are closely linked to the data and that 
decision makers will consider them seriously? 

15. How can assessment data be integrated into a student data 
base that includes elements related to successful 
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completion of a program of study, cognitive development, 
and the involvement and satisfaction of students? 

Recommendations for Future Research 
As more institutions initiate longitudinal assessment programs, 
either on their own initiative or in response to external man- 
dates and incentives, the authors will develop an information 
base that will greatly increase the validity and cost effective- 
ness of assessment. In the meantime, it is necessary to continue 
conducting research and analysis directed at a number of gaps 
in existing knowledge. The following recommendations for fu- 
ture research are therefore offered: 

1. The scarcity of standardized instruments that respond to 
institutional needs and goals in cognitive assessment has been 
widely recognized (cf. Banta and Fisher 1987; Boyer 1987; Ed- 
garton 1987). Those standardized instruments that are designed 
to measure a broad range of "higher order" skills and 
processes (postsecondary-Ievel analytic, communication, and 
critical thinking abilities, for example) are often expensive and 
difficult to administer. While we applaud such innovative ef- 
forts as the ACT COMP, the McBer Behavioral Event Inter- 
view, and the ETS Academic Profile, a continuing need clearly 
is test development for the measurement of cognitive skills and 
abilities. 

2. We strongly encourage those institutions who have used 
or plan to use the instruments reviewed here (or others) to pub- 
lish or present their experiences so that we can develop a pool 
of knowledge about the tools most appropriate for talent devel- 
opment assessments. Systematic comparisons of alternative in- 
struments designed to measure similar outcomes are also 
needed. Such comparisons would ideally examine such factors 
as the manner in which the tests define the concepts under in- 
vestigation, lest reliability and validity (including both conver- 
gent and discriminant validity), the suitability of the 
instruments for talent development approaches, practical issues 
in administering and scoring tests, faculty and student attitudes 
toward the instruments, and usefulness of the results for institu- 
tional planning and decision making. 

3. Because of the limitations of standardized tests, institu- 
tions will undoubtedly continue to design their own assessment 
tools to supplement if not replace commercially available in- 
struments. Instruments such as the ETS Academic Profile and 
the CIRP Freshman and Follow-up Student Surveys that pro- 
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vide space for optional, locally designed questions provide one 
approach to combining the best of standardized and locally de- 
signed assessments. For those institutions that want to develop 
their own assessment instruments^ however, especially to mea- 
sure students' cognitive development, a guide to test develop- 
ment in this area would be most helpful. A comprehensive 
*'how to'' guide, geared to the needs of institutional research- 
ers and managers would offer a useful resource that would sig- 
nificantly improve the qualitj' and cost effectiveness of local 
assessment. 

4. Prospective, longitudinal, multi-institutional studies con- 
forming to accepted standards of quasi-experimental design 
continue to be sorely lacking in the literature, especially when 
the focus is college students' cognitive development. Multi- 
institutional studies are essential to support analyses that inves- 
tigate the additive or interactive effects on student development 
of institutional characteristics, student characteristics, educa- 
tional curricula or programs, and student support sen/ices and 
co-curricular programs. Such analyses are needed to indicate 
the factors that promote learning within particular environments 
or for particular types of students. They will also point to the 
extent to which trends and patterns observed in one environ- 
ment can be generalized to other settings and populations. Multi- 
institutional studies also offer an opportunity to increase 
statistical power and thereby compare subgroups of students 
that are usually too small to yield valid data within single- 
institution studies. 

5. Typically, research about outcomes addresses cognitive 
and affective outcomes separately, as if they were independent 
phenomena. We encourage the development of more integrative 
approaches, considering the reciprocal relations between cogni- 
tive and affective factors. Similarly, studies that link levels of 
performance on tests of cognitive abilities to concrete behav- 
ioral outcomes (for example, graduation or enrollment in gradu- 
ate or professional school) v/ould be helpful for deriving 
additional benefits from assessment. 

6. Although both researchers and practitioners are increas- 
ingly finding effective means to use information about out- 
comes in decision making, the need for additional attention to 
the broad issue of utilization continues. Future research might 
review literature in other applied fields, such as planning, pub- 
lic policy, and environmental psychology, to determine addi- 
tional strategies for influencing decision making. Further, 



College Studem Outcomes Assessment 91 




although numerous researchers and practitioners.have reminded 
us that information about outcomes can be used in many differ- 
ent ways, we often still conclude that our research is underused 
if it does not contribute in a direct, linear, and observable man- 
ner to decision making. To help us move beyond this myth, 
development of a taxonomy of types of utilization, including 
common indicators of each type, could aid efforts to determine 
whether or how research and information about outcomes play 
a role in decision making. Finally, much of the higher educa- 
tion research on this issue ii based on small sample f.izes. 
Larger-scale surveys that assess factors associated widi utiliza- 
tion at a large number of institutions would contribute to an un- 
derstanding of this issue. 

7. A final, perhaps naive, suggestion is that outcomes re- 
searchers develop more cooperative relationships with their col- 
leagues. We rarely take the trouble to share details of our 
failures and successes with others. As a consequence, we lose 
the opportunity to observe or to influence the manner in which 
practitioners use data about outcomes at other than our own in- 
stitutions. To learn from our failures and share our successes 
will require ns to be less competitive and to develop a stronger 
sense of mutual trust. Wc hope, in short, that researchers will 
be willing to adopt a cooperative rather than adversarial ap- 
proach to the evaluation of outcomes, not only in their relation- 
ships to the subjects of evaluation but also in their relationships 
with one another. 



APPENDIX A 



SUMMARY OF COGNITIVE ASSESSMENT 
INSTRUMENTS DISCUSSED 

GENERAL EDUCATION TESTS 

Instruments Geared toward Lower-Division Students 

The ACT Assessment Program 

American College Testing Program 

2201 M Dodge Street 

RO. Box 168 

losva City, JA 52243 

(319) 337-1000 

195 minutes 

Scholastic Aptitude Test and Test of Standard Written English 
Tlie College Board 
45 Columbus Avenue 
NewYoric, NY 10023 
(212) 713-8000 

150 minutes and 30 minutes, respectively 

Genera! Examinations of the College-Level Examination Program 
Vie College Board 
5 general tests, 90 minutes each 

Sequential Tests of Educational Progress, Scries 111 
CTBIMcGraw-im 
Publishers Test Service 
Del Monte Research Park 
2500 Garden Road 
Monterey, CA 93940 
(800) 538-9547 
5 tests, 40 minutes each 

Stanford Test of Academic Skills (1982 Edition) 
Tlie Psychological Corporation 
555 Academic Court 
San Antonio, TX 78204 
(512) 299-1061 
135 minutes 

Instruments Geared toward Upper-Divlsicn Students 
Graduate Record Exam General Test 

Educational Testing Service 

Rosedale Road 

Princeton, NJ 08541 

(609) 921-9000 

210 minutes 
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Academic Profile 
Educational Testing Scmrc 

60 minutes or 180 minuics wilh oplional 45-minulc css?y 

Graduaic Managcmcnl Admissions Test 
Educational Testing Scmcc 
240 minuics 

Medical College Admission Test 
Association of American Medical Colleges 
One Dupont Circle, NW 
Suite 200 

Washington, DC 20036 
1202} 828-0400 
390 minuics 

Law School Admission Test 
Law School Admission Council 
P.O. Box 2000 
Newtown, PA 18940 
(215) 968^1001 
210 minuics 

NTE Core Battery 
Educational Testing Scnice 
3 tests, 120 minutes each 

NTE Pre-Professional Skills Test 
Educational Testing Service 
3 tests, 30 to 50 minutes each 

Instruments Geared toward All Levels 
College Outcomes Measures Project 

American College Testing Program 

360 minutes (composite) or less than 180 minutes (objective) 

McBer Behavioral Ovenl Interview 
Council for Adult and Experiential Learning 
]0S40 Little Patttxent Parkway 
Columbia, MD 21044 
1301) 997-3535 
Varie<; 

SPECIFIC SKILLS TKSTS 

Instruments Geared toward Lowcr-Dlvision Students 

English Composition Test with Essay 

The College Boara 

60 minutes 
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Nclson-Dcnny Reading Test, Forms E and F 
Riverside Publishing Company 
8420 Bryn Mawr Avenue 
Chicago, IL 60631 
(800) 323-9540 
35 minutes 

Writing Proficiencv Program 
CTBIMcGraw-Hill 

30 to 50 minutes for multiple choice; 30 to 50 minutes per essay 

Instruments Geared toward Upper-Division Students 
Western Michigan English Qualifying Examination 

Bemadine Carlson 

% Western Michigan University 

720 Spraii Tower 

Kalamazoo, m 49008 

100 mipules 

Doppelt Mathematical Reasoning Test 
The Psychological Corporation 

50 minutes 

Miller Analogies Test 
The Psychological Corporation 

50 minutes 

Instruments Geared toward All Levels 
Watson-Glaser Critical Thinking Appraisal, Forms A and B 
The Psychological Corporation 

50 minutes 

Cornell Critical Thinking Test, Level Z 
Midwest Publications 
P,0, Box 448 
Pacific Grove, CA 93950 
(408) 375-2455 
50 minutes 

Goyer Org lization of Ideas Test, Form S 
Robert S. Goyer 
Department of Communication 
Arizona State University 
Teinpe, AZ 85287 
(602) 965-5095 
40 to 60 minutes 
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SUBJECT NUTTER COMPETENCY TESTS 
Instruments Geared toward Lower-Division Students 
Advanced Placement Program of the College Entrance Examination 
Board 

The College Board 

26 tests, up to 180 minutes each 

College Board Achievement Tests 
The College Board 
14 tests, 60 minutes each 

Instruments Geared toward Upper-Division Students 
Graduate Record Exam Subject Tests 

Educational Testing Service 

17 tests, 170 minutes each 

National Teacher Examination Specialty Area Tests 
Educational Testing Service 
26 tests, 120 minutes each 

Instruments Geared toward All Levels 
ACT Proficiency Examination Program 

American College Testing Program 

49 tests, 180 to 420 minutes each 

College-Level Examination Program Subject Examinations 
The College Board 
46 tests, 90 minutes each 

Cooperative Exammation Program of the American Chemical Society 
American Chemical Society 
h:5 Sixteenth Street, NW 
Washington, DC 20036 
(202) 872-4600 
55 minutes 
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Nelson-Denny Reading Test, 50 
New York State Regents External Degree Program, 57 
NIE (see National Institute of Education) 
Northeast Missouri State University, 28, 34 
NTE (see National Teacher Examinations) 

0 

Organic Chemistry examination (ACS), 58 

Orientation, 6 

Outcomes 

aesthetic response, 23 

affective, 13, 21, 23 

analysis, 23, 25 

avoidance of negative, 21 

cognitive, 13, 21, 23, 24 

communications, 23 

critical thinking, 25 

definition, 19 

economic, 20 

emotional/moral development, 21, 23 
environmental responsibility, 23 
human characteristics, 20 
involvement in world, 23 
knowledge/lechnology/art form, 20 
measures of, 21 
nonstudent, 4 

practical competence, 21, 23 
problem solving, 23, 25 
resource/service provision, 20 
satisfaction, 21 
social interaction, 23 
student, 4 

taxonomies, 12-13, 19-23 
time, 22 
type, 21-22 
valuing, 23 
variables, 21 
writing ability, 25 
Outcomes assessment 

goals overview, 1-13 



128 



guidelines, 89-90 
measurement issues, 25--35 
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Recruilmenl, 9 
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State mandates, 2, 4 
Statistieal regression, 32-33 
STEP iscc Gcquc'^^ial Tests of Educational Progress) 
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Student services 

baseline data, 8 

evaluation, 5-^, 17 

information gathering, 6-7 
Students 

age, 31 

at risk, 7 

development information, 2 
entering, 16, 57 
experiences, 2, 31 
freshmen, 28, 33 
involvement, 17-18, 79, 80, 89 
lower division, 39-43, 48-51 



r^ji^r,^ Student Outcomes Assessment 1 H 
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